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Abstract 

Indirect inference estimators (i.e., simulation-based minimum distance estimators) in a 
parametric model that are based on auxiliary non-parametric maximum likelihood density 
estimators are shown to be asymptotically normal. If the parametric model is correctly 
specified, it is furthermore shown that the asymptotic variance-covariance matrix equals the 
inverse of the Fisher-information matrix. These results are based on uniform-in-parameters 
convergence rates and a uniform-in-parameters Donsker-type theorem for non-parametric 
maximum likelihood density estimators. 

1 Introduction 

Suppose Xi, . . . ,Xn are independent and identically distributed (i.i.d.) random variables with 
law P. Furthermore, we are given a parametric model Vq = {pe '■ G 6} of probability density 
functions pe and O C M™. Assume for the moment that Vq is correctly specified and identifiable 
in the sense that there is a unique 0o € 8 such that po^ is a density of P. A standard method 
of estimation of 9 is then the maximum likelihood method, which under appropriate regularity 
conditions is known to lead to asymptotically efficient estimators. However, in a number of 
models, e.g., in econometrics and biostatistics, the maximum likelihood method may not be 
feasible as no closed form expressions for the densities pe , and thus for the likelihood, are available. 
For example, the data may be modeled by an equation of the form Xi — g{ei,OQ) where Si are 
i.i.d. with a known distribution but the implied parametric densities are not analytically tractable 
because g is complicated or is high-dimensional. A similar problem naturally also occurs in the 
estimation of dynamic nonlinear models; see Smith (1993), Gourieroux, Monfort and Renault 
(1993), Gallant and Tauchen (1996), Gourieroux and Monfort (1996), and Gallant and Long 
(1997) for several concrete examples. This has led to the development of alternative estimation 
methods like the so-called indirect inference method, see the just mentioned references as well 
as Jiang and Turnbull (2004). Ideally, these estimation methods should also be asymptotically 
efficient. In our context these methods can be described in a nutshell as follows: 



1. Simulate a random sample Xi{d), ...,Xk{0) of size k from the density pe ioi 9 £ Q. [This 
is often possible in the examples alluded to above, e.g., by perusing the equations defining 

*This paper is based on the doctoral thesis of the first author written under the supervision of the second 
author. The authors are grateful to Richard Nickl for many discussions and for helpful comments on the paper. 
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the model. Note that then only the disturbances Si,...,Sk have to be simulated once and 
Xi{6) can be computed from g{si, 6) for any given 6.] 

2. Based on the simulated sample as well as on the true data, compute auxiliary estima- 
tors Pk{0) and pn, respectively, in a not necessarily correctly-specified but numerically 
tractable auxiliary model ^Vt"^"^. [For example, by maximum likelihood if jV(""^ is finite- 
dimensional.] 

3. With a suitable choice of a distance x then estimate 9o by minimizing over Q the objective 
function 

In most of the indirect inference literature, the auxiliary model Al""^ is assumed to be 
finite-dimensional indexed by a vector P & B C say, and one then in fact minimizes a 
distance between /3„, the maximum likelihood estimator in the auxiliary model computed from 
the original data, and $k{0), the maximum likelihood estimator in the auxiliary model computed 
from the simulated sample Xi{9), ...,Xk{9). The resulting indirect inference estimator can be 
shown to be consistent and asymptotically normal (under standard regularity conditions, see 
Gourieroux and Monfort (1996)). However, the indirect inference estimator is asymptotically 
efficient (in the sense of having the inverse of the Fisher-information matrix as its asymptotic 
variance-covariance matrix) only if happens to be correctly specified. This assumption is 

certainly restrictive and often unnatural if A^'^"^ is of fixed finite dimension. Therefore Gallant 
and Long (1997) suggested that choosing with dimension increasing in sample size should 

result in estimators that are asymptotically efficient, the idea being that this essentially amounts 
to choosing an infinite-dimensional auxiliary model ^'^^^ ^ for which the assumption of correct 
specification is much less restrictive. In particular. Gallant and Long (1997) set out to study the 
case where the density estimators are based on non-parametric maximum likelihood estimators 
over sieves spanned by Hermite-polynomials, but their limiting result is only informative if the 
sieve dimension stays bounded (so that efficiency of the estimator is only established if the true 
density is a finite linear combination of Hermite-polynomials) bringing one back into the realm 
of finite-dimensional auxiliary models. 

In the present paper we show in some generality that the suggestion in Gallant and Long 
(1997) is indeed correct, namely that the indirect inference estimator for is asymptotically 
normal with the inverse of the Fisher-information matrix as its asymptotic variance-covariance 
matrix if the auxiliary estimators pk {0) and Pn in Step 2 are chosen to be non-parametric maxi- 
mum likelihood (NPML) estimators obtained from optimizing the non-parametric likelihood over 
suitable bounded subsets of a Sobolev-space and if the size k of the simulated sample is of order 
larger than n^. Furthermore, we show that asymptotic normality persist even if the originally 
given model V@ is misspecified. [We do not explicitly consider sieved NPMLs, although analogous 
results for such estimators arc certainly possible. This would require a uniform-in parameters 
extension of the results in Nickl (2009), paralleling the extension of Nickl (2007) provided in the 
present paper.] 

Wc now comment on some related literature in the area of indirect inference: Fermanian and 
Salanie (2004) propose a different procedure and establish asymptotic efficiency of their estima- 
tors under several high-level conditions, which, as they admit themselves, are very stringent. For 
example, even in the simplest model they consider, they need to have simulations of order k ^ n^. 
Nickl and Potscher (2010) consider the case where Pk{d) and p„ are not NPML estimators but are 
spline projection estimators and they establish asymptotic normality and asymptotic efficiency 
if the parametric model Vq is correctly specified. In contrast to the present paper, Nickl and 
Potscher (2010) also analyze the case where k, the size of the simulated sample, is not necessarily 



2 



of order larger than n? . We discuss this in more detail in Remark [211 in Section [51 There are 
also some other related recent papers on this topic, Altissimo and Mele (2009) and Carrasco, 
Chernov, Florens, and Ghysels (2007), whose proofs, however, we were not able to follow. 

In the present paper we shall use for x the Fisher-metric, hence the objective function defining 
the indirect inference estimator will be given by 



It transpires that the indirect inference estimators considered in the present paper can be viewed 
as minimum distance estimators with the important (and nontrivial) modification that pg has 
been replaced by an estimator pk{0) based on the simulated data. In that sense our results can 
be viewed as an extension of Beran's (1977) asymptotic efficiency result for classical minimum 
distance estimators to the case of simulation-based minimum distance estimators^ the simulation 
step introducing considerable additional complexity into the proofs. 

In order to establish the above mentioned results for the indirect inference estimator a careful 
study of several aspects of the NPML-estimators pk{0) and p„ is required. In particular, it turns 
out to be beneficial to establish the weak convergence of the stochastic process 



to a Gaussian process in t°°{<d x F) where F is an appropriate class of functions. This result can 
be seen to imply a uniform-in-6' version of a Donsker-type result for NPML-cstimators obtained 
recently by Nickl (2007). In the course of establishing this weak convergence result it is also 
necessary to derive rates of convergence for 



where the norm is a suitable Sobolev-norm. 

The outline of the paper is as follows: After some preliminaries in Section [5J we introduce 
the model and assumptions in Section [31 In Section 14.11 we derive existence and uniqueness 
of the NPML-estimator while rates of convergence as indicated in ([3]) are given in Section 14.21 
Donsker-type theorems like ^ are the subject of Section 331 In contrast to Nickl (2007), we 
avoid an assumption that requires all densities to be bounded away from zero in our results as far 
as possible. Section [SI introduces simulation-based minimum distance estimators (i.e., indirect 
inference estimators) based on auxiliary NPML-estimators and establishes asymptotic normality 
of these estimators even if the originally given parametric model Vq is misspecified. If Vq is 
correctly specified, it is furthermore shown that the estimator is asymptotically efficient in the 
sense that its asymptotic variance-covariance matrix equals the inverse of the Fisher-information 
matrix. Some proofs and technical results are collected in the appendices. 

2 Preliminaries and Notation 

For A a non-empty set and / a real-valued function on A, define ||/||a — sup^-gy^^ 1/(2^)1 and let 
^°°(A) denote the Banach space of all bounded real- valued functions on A, equipped with the 
sup-norm || • ||a. If 2? is a (non-empty) subset of £°°(A) we shall write (P, || • ||a) to denote 
the metric space T> with the induced metric ||/ — g\\\- For {h.,A) a (non-empty) measurable 
space, let £°(A,^) denote the vector space of all ^-measurable real- valued functions on A and 
define the Banach space L°°(A,^) = C^{K^A) n ^°°(A), again equipped with the sup-norm. For 





(2) 



see 



s^^\\Pk{0) -pe\\s,2 



(3) 
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1 /2 

/ e £'^{A,A) and fi a non-negative measure on (A,^), define ||/|l2,^ = [/a -^^^^l 
£2(A,y^,/x) = {/ e C°{A,A) : \\f\\2,f, < oo}. For the measure space {n,B{n),\), where is a 
(non-empty) measurable subset of the real line M with associated Borel tr-field B{^1) and where 
A is Lebesgue measure, we shall simplify notation and write £"(fi), £^(i7), L°°(i7), and || • ||2 
for C°{n,B{n)), C^{n,B{n),X), L°°{n,B{n)), and || • ||2,a, respectively. Furthermore, we shall 
write a.e. instead of A-a.e. For any (non-empty) metric space {T,d), we denote by B(T,d), or 
simply B{T), its Borel cr-field and by C(r, d), or simply C(T), the Banach space of all bounded, 
rf-continuous real-valued functions on T, equipped with the sup-norm. 

We shall denote by || • j| the 2- norm on Euclidean space. For two real-valucid ftmctions / and g 
on (0, oo), we shall write /(e) < g{e) if there is a constant C, < C < oo, such that /(e) < Cg{e) 
holds true for all e > 0. It will also prove useful to define logoo = oo and logO = — oo, thus 
making the logarithm a continuous function from [0,oo] to [—00,00]. 

Let (Ao,^o,-Po)j (An,-4„,P„), n > 1, be probability spaces. Suppose Iq : Ao T is an 
Ao-B{T, (i)-measurable mapping and Yn : A„ ^ T are (not necessarily measurable) mappings, 
where (T, d) is a metric space. We say that y„ converges weakly to Yq in {T,d), denoted by 
Yn Yq, ii the outer integrals g{Yn)dPn converge to Jj^^g{Yo)dPo for every g € C(T, d); 
furthermore, Yn is said to converge weakly to a Borel probability measure L on {T,B{T,d)), 
denoted hy Yn L, if g{Yn)dPn converges to Jj^gdL for every g G C{T,d). We say that Yn 
converges to r G T in outer P„-probability if P*((i(F„, r) > e) converges to for all e > 0. If F„ 
are real- valued and r„ is a sequence of positive real numbers, we write Yn = Op^{r„) if r~^Y„ 
converges to in outer P„-probability, and Yn = Op^{rn) if 

lim limsupP„* (r-^Yn >M)=0. 

M-s-oo „_).oo 

In case the probability spaces (A„,^„,P„) are the n-fold products of a single probability space 
(A,^,P), that is, (A„,^„,P„) = (A",^",P"), we write Yn = o^(r„) instead of Yn = oJ,„(r„) 
and Yn = Op(r„) for F„ = Op„(r„). 



2.1 Holder and Sobolev Spaces 

For ft a (non-empty) open subset of K, a function f : fl 



and s > 0, define 



\x-v\'-l'l 



if s is non-integer, 
otherwise. 



Here /'"' denotes the classical derivative of / of order a, and [sj denotes the integer part of s. 
For any non-integer s > 0, define the Holder space C*'(n) as the space of all / : — )■ M such that 
ll.f ||s,n < 00; for any integer ,s > 0, let C'*(J7) be the space of all / : J7 — > M such that |l/||s,n < 00 
and /*^^) is uniformly continuous. Note that €"(17) thus is the space of bounded and uniformly 
continuous functions on fl. 

For O and s as above and functions f,g G 'C^(O), let 



{f\9)s,2 = { if s is non-integer. 



^0<a<s 



otherwise. 
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and set ||/||s,2 = \/ (/|/)s,2- Here, denotes the weak derivative of / of order a, and 

(•|-)2 is the usual (semi)inner product on C?{^. Define WlC^^) as the space of all / e C?{^ 
such that ll/lls.2 is finite. As usual, wc equip WKfJ) with the (scmi)norm || • ||s^2- For s > 
1/2 and ft a non-empty bounded open interval in R, each / e Wi{^) is a.e. equal to exactly 
one bounded continuous function on CI. For s > 1/2 and such fl, we consequently define the 
Sobolcv space W2(f2) — yV|(J7) n C(f2) and note that it is a Hilbcrt space. The Sobolcv balls 
{/ S W|(f2) : ||/||s,2 < B} of radius B, < B < oo, will be denoted by Us,b, and its translates 
9 + Hs,B by Us,b{9)- The next proposition collects some properties of Sobolev spaces; see Gach 
and Potscher (2010) for a proof. 

Proposition 1 Let fl be a non-empty bounded, open interval in K. 

(a) For s > 1/2, the Sobolev space W2(ri) is a multiplication algebra; that is, there is a finite 
constant Mg > such that 

\\f9\\s,2<Ms\\f\\s,2\\9\\s,2 

holds true for all f,g€ W|(r2). 

(b) For s > 1/2, the Sobolev space W2(ri) is continuously embedded in C*~^/^(r2). Conse- 
quently, W|(r2) is embedded in C(f2) with an embedding constant Cg, < Cg < oo; that is, 

ll/lln<a||/|U,2 

holds true for all f € W|(f]). 

(c) IfO<r<s, then W|(fi) is compactly embedded in yVJ(ri); if 1/2 < r < s, then W|(fi) 
is compactly embedded in W2(ri). 

(d) If F is a (non-empty) bounded subset of some Sobolev space W2(ri) of order s > 1/2 such 
that mlxenjeJ" \f{x)\ > holds, then {1/f : f G F} is also a bounded subset o/W2(f2). 

2.2 Covering Numbers and Metric Entropy 

Let (T, d) be a metric space. Let < e < oo and let X be a (non-empty) totally bounded subset 
of T. Then we denote by N{e, X,T,d) the covering number of X, i.e., the minimal number of 
closed balls in T of radius e needed to cover X; we define the metric entropy of X as 

H{e, X, T, d) = log N{e, X, T, d). 

If T is a normed space with norm ||-||, we shall write in abuse of notation N{s,X,T, ||-||) and 
similarly for the metric entropy. 

Let (A, A, be a (non-empty) measure space. For any two elements l,u& -C°(A, A), the set 

[l,u] = {/ e C^{h,A) ■ l{x) < f{x) < u{x) for all x € A} 

is called a bracket and ||m — Z||2,^ its £^ (/i)-bracketing size. For < e < oo and F^ a (non-empty) 
subset of £°(A,^), wc define N[ ]{e,F, \\ ■ \\2.t1) to be the minimal number of brackets of C'^{iJ.)- 
bracketing size less than or equal to e needed to cover F; if there is no finite number of such 
brackets, we set A''[ ]{e,F, || • ||2,;i) = 00 for convenience. The £^ (/i)-bracketing metric entropy of 
T is defined as 

H[]{e,F,\\-h,^) = logN[]{e,F,\\-\\2,^). 
Furthermore, for < 77 < 00 the £^(/i)-bracketing metric integral /[ ]{ri,F, || • II2 of F is given 

by 

I[]{V,^,\\- h,ti) = / \/'^ + H[]{e,J^,\\-\\2,n)de. 
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3 The Framework and Assumptions 

From now on let £7 be a non-empty bounded, open interval in R. We consider i.i.d. random 
variables (Xi)igN that take their values in (r2,S(f2)) and have common law P, with Xi, . . . ,Xn 
representing the data at sample size n. Furthermore, let be a (non-empty) compact subset 
of and let Vq = {pe : 9 G Q} be a parametric family of probability density functions pe 
on n. The law P may or may not correspond to a density in Vs- We assume that there is 
a way of simulating synthetic data according to the densities in the class Vq in the following 
sense: There is a probability space {V,V,^) and a function p : F x 9 — f2, which is V-B{il)- 
measurable in its first argument, such that for every d Q the law of p{-^ 9) under fi has density 
pg. Consequently, if (Vi)igN is a sequence of i.i.d. random variables with values in (V, V) and 
law /i, then Xi{9) — p{Vi,9) is an i.i.d. sequence with law having density pg^ simultaneously so 
for all 9 a Q. We shall also always assume that the process {Vi)i^fi is independent of (Xi)igpj. 
[As indicated in the Introduction, the simulation mechanism p may derive form an underlying 
equation model, but it may also arise in some other way.] In the application to indirect inference 
in Section [5] we shall estimate 9 by matching a non-parametric estimator for (the density of) P 
obtained from the data Xi , . . . , Xn with a non-parametric estimator for pg obtained from the 
synthetic data Xi{9), . . . , Xk{9). We stress that construction of the synthetic data requires only 
one simulation, and not a separate simulation for every 9. For convenience we shall from now 
on assume that the random variables Xi and Vi are the respective coordinate projections on the 
measurable space {il^ x F^,i3(r2)^(g) V'*') equipped with the product measure Pr := P'^®/i'^. We 
note, however, that all results of the paper hold also without this assumption; see Remark 1171 
Furthermore, the empirical measures associated with Xi, . . . , X„ and Vi,. . . ,Vk will be denoted 
by Pn and Pf., respectively. 

The density estimators we shall consider will be NPML-estimators over non-parametric mod- 
els (called auxiliary models in Section [5]) of the form 



where t > 1/2, < C < oo, and < D < oo. Some important properties of Vit, (, D) that will 
be used repeatedly are summarized in the subsequent propositions, the proofs of which can be 
found in Appendix 1X1 

Proposition 2 Suppose t > 1/2, < C < oo, and < D < oo. 

(a) The following statements are equivalent: (i) C < A(r2)~^ < ; (ii) the constant density 
A(f2)^^ belongs to 'P{t,C,,D); (Hi) 'P{t,(,D) is non-empty. 

(h) Suppose C, < A(f2)^^ < . Then the following statements are equivalent: (i) C = 
or A(f2)~^ — ; (ii) the constant density X{fl)^^ is the only element of V{t,(^,D); (Hi) V{t, ^, D) 
is a singleton. 

(c) Suppose C, < A(r2)~^ < . Then V{t,(^,D) is a non-empty convex set, which is compact 
in C(r2) as well as in W|(J7) for every s satisfying 1/2 < s <t. 

In the following let Ht denote the closed afhne hyperplane given by Hj = {/ e W2(fi) : /^^ / dA = l} 
endowed with the relative topology it inherits from W|(f2). Note that V{t,C,,D) C Ht holds. 

Proposition 3 Suppose t> 1/2 andO <( < X{^iy^ < D'^ <oo. 

^An obvious extension of Theorem V.2.1 in Dunford and Schwartz (1966) to affine spaces shows that in our 
setting the notion of an element being interior relative to H coincides with the notion of internality of that element 
(relative to H). 
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(a) An element p G V{t,C,,D) is an interior point of V{t, D) relative to Ht if and only if 
(i) \\p\\t,2 < D and (ii) \c£x^up{x) > C hold. 

(b) A (non-empty) subset V' of V{t,C,,D) is uniformly interior to V{tXiD) relative to Ht 
(meaning that there exists a 5 > Q such that for every p G V the set Ut^s{p) H H( C V{t, C, D) ) if 
and only if (i) sup^g-p, |bl!t,2 < D and (ii) inf j;go,pg-p' > C hold. 

(c) Suppose C < A(ri)^^ < holds. Then the constant density X(fl)^^ is interior to 
V{t,(^,D) relative to Ht. Moreover, the interior ofV{t,(^,D) relative to Hj is dense in V{t,(^,D) 
(w.r.t. the \N\{Vl) -topology) . 

We emphasize that for the rest of the paper t, C,, and D will be treated as fixed (although 
at arbitrary values) satisfying the constraints t > 1/2 and < C < A(r2)^^ < < oo (thus 
excluding only the trivial cases where 'P{tX,D) is empty or the singleton {A(ri)~^}). Many 
results will hold under the natural condition C ^ 0; but for some results we shall have to assume 
the stronger requirement ^ > 0. In that context we note that if is sufficiently close to A(ri)~^, 
then V{t,0,D) coincides with V^t.^jD) for sufficiently small C > 0, cf. Remark [28l in Appendix 

El 

For later use we stress that any p G 'P{t,(^,D) is continuous on f2 and satisfies ||p|jsi < CtD 
in view of Part (b) of Proposition [T] We further note the fact that in 'P{tX,D) pointwise 
convergence is equivalent to convergence in all Sobolev norms of order smaller than t, as well as 
to convergence in the sup- norm, as shown in Proposition [27] in Appendix [XI 

Apart from the maintained assumptions laid out at the beginning of this section, we will 
make frequent use of the assumptions listed below. We start with assumptions on the probability 
measure P governing the data. 

Assumption D The probability measure P has a density p^ . 

In the following we treat the probability density p^ as a function from to M, that is, we 
let p^ denote a fixed representative of the Radon-Nikodym derivative of P with respect to A. 
Recall also that P need not correspond to an element of Ve, hence p^ need not be a.e. equal to 
an element of Vq. 

Assumption D.l Assumption\^ holds and the density function p ^ belongs to V{t, D). 
Assumption D.2 Assumption\^ holds and the density function p ^ satisfies the strict inequality 

inf p^{x) > 0. 

Clearly, ii C, > 0, then Assumption ID. II implies Assumption ID. 21 In light of Proposition [31 
the next assumption just states that p^ is an interior point of 'P{t, C, D) relative to Hi. 

Assumption D.3 Assumvtion \D.i\ holds and the strict inequalities 

infpA(x)>C and \\pA.\\t,2 < D 

are satisfied. 

We note here, however, that even under Assumption ID. 31 the NPML-cstimator is never an 
interior point of ^, Z?) relative to Hj as shown in Section [H this leads to a number of 
complications as discussed prior to Lemma [14] in Section 14.31 

Next are assumptions on the class Vq. We will often write p{x,9) for pe{x), and we stress 
that p{x, 9) is a function from x to R. 
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Assumption P.l Vq Q V{t,C,D). 
Assumption P. 2 The strict inequality 

inf p(x,e) > 

holds true. 

Clearly, if C > then Assumption IP. 1 1 implies Assumption IP.2I 

Assumption P.3 Assumvtion lP. l] holds and the strict inequalities 

iid p{x,0) > and sup ||pe||f 2 < -D 
fixe 

are satisfied. 

Assumption lP.SI states that Ve is uniformly interior to V{t, C, D) relative to Ht, of. Proposition 
[31 If Vq happens to be a || • ||t_2-compact subset of P{tX,D) (which in light of compactness of 
Q is, e.g., the case if the map 9 pg is \\ ■ ||t.2-continuous). Assumption IP.3I is clearly equivalent 
to mixenp{x,6) > C and ||p0||^2 < D for every 9 £ Q (i.e., equivalent to Ve belonging to the 
interior oi V{tX, D) relative to Ht). 

We note that in the correctly specified case, i.e., if there exists a 6*0 € 8 such that pg^ is a 
density of P, Assumptions ID . iTlD . 3 1 follow automatically from the respective Assumptions IP. lllP. 31 
(and Assumption |D] trivially holds) . 

Occasionally we shall also need to refer to the following assumption. However, note that As- 
sumption |RT] together with Assumption lR.ll below already imply this assumption, cf. Proposition 
[291 in Appendix [SI 

Assumption P.4 For every x Cz fl, 9 t-^ p{x,9) is a continuous function on Q. 

Remark 4 If Assumption IP.ll is satisfied, then in view of Proposition [27l in Appendix [XI the 
following are equivalent: (i) Assumption IP.4I (ii) 9 t-^ pg is continuous as a mapping from Q into 
the space {V{t,(,D), \\ ■ \\s,2) for every s satisfying < s < t; (iii) 9 1-^ pg is continuous as a 
mapping from Q into the space {V{t, (, D), \\ ■ ||n). 

Next are assumptions on the simulation mechanism p{v, 9). Apart from the already assumed 
measurability of p(w, 9) in its first argument, we will need assumptions to control its behaviour 
in the second argument. We note that Assumption IR.2I below is weaker than the corresponding 
Assumption R.2 in Gach (2010), but we have been able to obtain the same conclusions as in 
Gach (2010) by refining the proofs. Clearly, Assumption IR.2I implies Assumption IR. 1 1 

Assumption R.l For every v ^V, the simulation mechanism p{v,9) is continuous in 9. 

Assumption R.2 For some constant 7, < 7 < 1, o.'rid some measurable function R : V 
(0, 00), the simulation mechanism p : V" x — > $7 satisfies 

\p{v,9')- p{v,9)\<R{v)\\9' -9^ 

for all V a V and all 9, 9' G Q, with the function R satisfying Jy R°'dp < 00 for some a > 0. 

Assumptions on the class Vq and on the simulation mechanism p(v, 9) are obviously closely 
related. In principle, the assumptions on Ve could be substituted for by assumptions on p{v, 9). 
[Conversely, the existence of a simulation mechanism having certain required properties can in 
principle be deduced from suitable assumptions on Ve-] However, the interrelation between 
assumptions on Ve and on p{v, 9) is complicated and intricate, and hence we prefer to work with 
the two sets of assumptions as given above. For some results concerning the relationship between 
these two sets of assumptions see Proposition 1^ in Appendix Rl 
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4 Non-Parametric Maximum Likelihood Estimators 



We now introduce NPML-cstiniators, called auxiliary estimators in Section [5] Define the (non- 
parametric) log-likelihood function based on the given data Xi , . . . , X„ as 

1 " 

L„(p) := Ln{p;Xi,...,Xn) ^ - V'logp(Xj) 

n ^ — ' 

for p G V^t, -D), and based on the simulated data Xi{9) = p{Vi,9), . . . , Xk{0) = p{Vk,9) as 

1 

Lfc(0,p) Lk{e,p; Fi, . . . , Ffe) = - ^ logp(p(F„0)) 

1=1 

forpe7'(i,C,i^) and(?e e. Note that = Xfe(^^)) = fc-^ ^ti logp(X^(^?)) 

holds. In view of our convention for the logarithm, both functions Ln{f) and Lk{9, /) are in fact 
well-defined and take their values in [—oo, oo) for any non-negative real- valued function / on n. 

An NPML-estimator for given Xi, . . . , Xn is defined as an element Pn{-) :— Pn{-', Xi, . . . , Xn) 
of V{t, (, D) satisfying 

Ln{Pn) = sup Ln{p). 

peV{tx,D) 

Similarly, an NPML-estimator for given . . . ,Xk{0) is an element ■— Pk{0){-;Vi, . . . ,Vk) 

of V{t, D) satisfying 

Lki9,pkm = sup Lk{e,p). 

pev{tx,D) 

Clearly we have 

Pki0){-; Fi, . . . , Vk) = Pk{-, X,{9), . . . , Xk{9)). (4) 

In this section we investigate existence, uniqueness, consistency, rates of convergence, and 
uniform central limit theorems for NPML-estimators. The results obtained here go beyond Nickl 
(2007) in three respects: First, we show not only existence but also uniqueness of the NPML- 
estimators. Second, we allow for non-parametric models V{t,C,,D) where the lower bound for 
the densities, i.e., ^, can be equal to and extend the consistency and rate results for the 
NPML-estimator w.r.t. the Sobolev-norms || • ||s^2 with s < t m Nickl (2007) to this case. We 
furthermore also establish inconsistency of the NPML-estimator in the |j • |ji^2-n-0rm. Third, we 
prove that the consistency and rate results in Nickl (2007) for p„ hold for the NPML-estimators 
Pk{&) even uniformly over the parameter space (provided that C > 0). Finally, we prove a 
uniform Donsker-type theorem which extends Theorem 3 in Nickl (2007) and shows that, for 
appropriate classes J", the stochastic process {9, f) ^/k J^{pk{9) —pg)fd\ converges weakly in 
£°°(9 X F) to a Gaussian process. 

4.1 Existence, Uniqueness, and Consistency of NPML-Estimators 

In the following theorem we show that the NPML-estimators defined above exist, are unique, 
and are measurable (cf. also Lemma 1551 in Appendix [P]) . 

Theorem 5 (a) There exists a unique pn G V{t,(^,D) such that 

in(Pn) = sup Ln{p) 
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holds. The resulting mapping Pn ■ i^" — > V{t,C,,D) is measurable with respect to the a-fields 
B{fl)" and B{V{tX, D), \\ ■ \\q). Moreover, p„ always satisfies ||p„||f_2 ~ D. 
(h) For each 9 (z O there exists a unique Pk{0) £ V{t, D) such that 



holds. The resulting mapping Pk{d) '. —>■ V{t,(^,D) is measurable with respect to the a-fields 
V'^ and B{'P{t, C,, D),\\ ■ Wii). Moreover, Pk{d) always satisfies ||pfc(6')||t^2 = D. Furthermore, if 
Assumption \R.1\ is satisfied, then, for arbitrary fixed values of the underlying simulated vari- 
ables Vi, . . . ,Vk, i~-> Pk{0) is continuous when viewed as a mapping from Q into the space 



Proof, (a) Let xi,...,Xn be given points in i7. The existence of a maximizer of Ln{p) = 
Ln{p; xi, . . . ,Xn) follows from the fact that L„ is continuous on the compact space (V{t, D), \\ ■ 
\\n) by Part (bl) of Proposition [501 hi Appendix |B] with T = r{t, C, D) and by Proposition^ We 
next establish uniqueness: Denote by S the set of all p € V{t, C,, D) that maximize L„, and note 
that S is non-empty as just shown. Since L„ is a concave function on the convex set V{t, C,, D) 
with values in [—00,00), a standard argument shows that S is convex. If S" is a subset of the 
Sobolev sphere of radius D we are done, as then S must be a singleton since the Sobolev norm 
II • ||t^2, being a Hilbert norm, is strictly convex. Suppose now S is not a subset of the Sobolev 
sphere of radius D and let p £ S with ||p||t.2 < D. Then there is some z G ft with p{z) > C, since 
the maintained assumption < X~^{D,) implies that C ^ V{t,C,,D). By continuity of p we may 
assume that z is different from any of the finitely many data points xi, . . . ,Xn- We claim that 
there is & q £'P{t, D) such that q(xi) > p{xi) whenever Xi — xi and q coincides with p on the 
remaining (if any) observations Xj with Xj ^ xi. This will contradict the maximizing property 
of p (noting that the case Ln{q) = Ln{p) — —00 is impossible in view of A(r2)^^ £ Vit, D) and 
Ln{p) > in(A(fi)^^) > —00). The existence of such a q can be seen as follows: Choose £ > 
such that I := [z — 2e, z + 2e], U := [xi — 2e, xi + 2e], and {xj : xj ^ xi} are pairwise disjoint 
subsets of ri and inix^i p{x) > (■ As A :— [xi — e, xi + e] is a closed set contained in the open 
set U := (xi — 2e,xi + 2e), there is a compactly supported C°°-function f : fl ^ M. with values 
in [0, 1] such that /|^ = 1 and f\n\u = 0- For every y € fl let 



so that / is the translation of / by z — xi; and define g : — > M by g = / — /. Then g has values in 
[— 1, 1], integrates to 0, and is contained in W2(ri) since it is C°° and has compact support in il. 
Since ||p||t.2 < D and ini^ei p{x) > (, we can find a scalar (3 > such that ||/3g||t,2 1^ D — ||p||t.2 
and /3 < irii^eiPi^) ~ (■ Let q — p-\- f3g and observe that ||g||t,2 < l|p||t,2 + ||/?5||t,2 < D. Further, 
<l{x) > C for every a; G fi, which can be seen as follows: For x G \ I we have that g{x) > 0, 
and hence q{x) > p{x) > C- If x G I, then q{x) > p{x) — P > p{x) — iidx^j p{x) + C ^ Cj where 
the first inequality holds because g{x) > — 1 for every x £ il, the second inequality holds by the 
choice of /?, and the third one does so since x G I and therefore p{x) — iidx^i p{x) > 0. It follows 
that q G Vit, C, D). Since /3 > and 5(3:1) — 1, q(xi) > p(xi) whenever Xi = xi. Furthermore, q 
coincides with p on the remaining (if any) data points because g is there. The existence of q 
contradicts the maximizing property of p, and consequently S" is a subset of the Sobolev sphere 
of radius D. We thus have established uniqueness as well as ||pn||t,2 = D. 

To see that p„ : fi" — > 'P{t, C, D) is measurable, we apply Lemma A3 in Potscher and Prucha 
(1997), making use of PropositionlSOTal.fbl') in Appendix [B] [Because L„ potentially can attain 



Lk{e,Pk{e)) 



sup Lk{9,p) 
peVit.c.D) 



(P(t,C,i?),||-||o). 




f{y + xi — z) if y + xi — z e n, 
otherwise. 
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the value —oo, we apply this lemma to the real-valued function arctan(L„) rather than to L„, 
where we use the usual convention arctan(— oo) = — 7r/2.] 

(b) The same arguments as above establish existence, uniqueness, and measurability oi pk{0), 
as weU as |lpfc(0)||t,2 = D, for any fixed 6' S 9. To see that the mapping 9 i-)- Pk{S) is continuous as 
claimed, apply Lemma [551 in Appendix iBl with X — Q, Y — {V{tX,D), \\ ■ ||o), u{x,y) = Lk{0,p), 
and v{x) —pk{0). Note that {V{tX,D), \\ ■ is a compact metric space by Proposition [5] and 
that, under Assumption IR.li Lk[9,p) is continuous on 9 x (^(i, D), \\ ■ as can be seen by 
applying Part (b2) of Proposition [501 in Appendix iBl with T = P{tX,D). ■ 

Remark 6 (i) The mapping p„ : i7 x 51" K is continuous in the first argument and B{il)^- 
measurable in the second argument. Since 57 is separable, p„ is consequently jointly measurable. 
Similarly, the mappings pk{0) : il. x V'' ^ M. are jointly measurable for all 6 G Q. 

(ii) For any xi, . . . , a;„ in f2, we have that Pn{xi) = Pnixf, xi, . . . , a;„) > for i — 1, . . . , n. 
This follows from the observation made in the above proof that L„(p„) > — cxd must hold. By 
a similar argument we have that pk{0){p{vi, 9)) — pk{0)(p{vi, 9);vi, . . . , Vk) > for i = 1, . . . , fc 
and for every 9 d Q. 

We next turn to consistency of the NPML-estimators. Theorem [5] already shows that p„ 
cannot be consistent in the || • ||t.2-norm as ||pn||t,2 = D always holds and V^LXtD) contains 
densities with || • ||t.2-norm less than D (under our assumptions on C, and D). A similar remark 
applies to Pk{9)- However, this does not preclude consistency of the NPML-estimators in other 
norms as we show next. To this end define for any non-negative measurable function / on 51 and 
for any 9 € Q 

L{f)= I log/dP 
Jn 

and 

L(9J)^ f log f {pi; 9))dp 
Jv 

provided the respective integral is defined. If / G L°°(51), then both functions are well-defined 
and take their values in [— cx),cx)). We note that the restrictions of L{f) to 'P{t,C,D) and of 
L{9, /) to 9 X Vit, C,, D) are real- valued in case C > 0. We will make use of the following simple 
facts which are proved in Appendix |B] 

Lemma 7 (a) L(p^) is well-defined and satisfies L(p^) > —oo, provided AssumvtionWl holds. 
Similarly, for every 9 G 9, L{9,pg) is well-defined and satisfies L{9,pg) > — oo. 

(b) If Assumvtion W . 1\ is satisfied, thenp^ is the unique maximizer of the function L(-) over 
V{t,C,D). 

(c) If pe G 'P{iiC,,D) for a given 9 Cz Q, then pg is the unique maximizer of the function 
L{9,-) overr{t,C,D). 

The consistency result is now given below. Under the additional assumption that C is positive. 
Part (a) of the subsequent theorem already follows from Proposition 6 in Nickl (2007). 

Theorem S (a) Let Assumvtion W. 1\ be satisfied. Then 

lim ||p„ -pa||s.2 = P-a.s. 

for every s, < s < t; in particular, lim„^oo \\Pn — Pk\\n = P-a.s. 
(b) Let pg G V{t, D) for a given G 9. Then, for the given 9, 

\im \\pk{9) - pg\\s,2 ^ p-a.s. 

k^oo 
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for every s, < s < t; in particular, linifc^oo \\Pk{0) — Pe\\n = ^-a.s. 
(c) Let Assumvtions \P71[ \P.2[ and \R.l\ he satisfied. Then 

lim sup ||pfc(6') - pe||s.2 = ^,-a.s. 

for every s, < s < t; in particular, linife_s.oo sup^^Q \\Pk{0) ^ PeWn — ^-a.s. 

Proof, (a) In view of Part (c) of Proposition[l] we may restrict ourselves to the case 1/2 < s < t. 
Note that [^(pa)! < oo by Assumption ID . 1 1 and Part (a) of Lemma [Tj also note that the random 
variables logp^ i^i) are P-a.s. real- valued. By Kolmogorov's strong law of large numbers we then 
have 

lim \Lnip^)- Lip^)\^0 P-a.s. (5) 

Let £; be positive real numbers that converge monotonously to as / — > oo. Apply the uniform law 
of large numbers in Part (dl) of Proposition[3niin Appendix IB] with = {p + ei : p £ V{tX,D)} 
to see that 

lim sup \Ln{p + ei)- L{p + ei)\=0 P-a.s. (6) 
pe-p(t, C-D) 

for every I G N. In the following arguments we fix an arbitrary element of the probability 1 
event where the statements in ^ and (O hold true. We now prove that \\pn — Pa\\s,2 converges 
to by showing that any subsequence Pn' of p„ has another subsequence converging to p^ in 
the Sobolev norm || • ||s^2- Because V{tX,D) is compact in W|(17) by Proposition [2 there is a 
subsequence Pn" of p„' and some p* G V{tX,D) such that \\pn" — P*\\s,2 converges to 0. Now 
use Assumption ID . l"! the definition of Pn" as maximizer, and the monotonicity of the logarithm 
to obtain 

Ln"{Pk) < Ln"{Pn") < Ln'>{pn" + £l) 

< L{pn"+ei)+ sup \Ln"{p + ei)~L{p + ei)\. (7) 

The first term on the r.h.s. of ([7]) converges to L{p* + ei) since \\pn" — P*||s,2, and hence also 
\\Pn" ~ P*||o, converges to and since L{- + ei) is sup-norm continuous on V{tX,D) by Part 
(cl) of Proposition [501 in Appendix IB] The supremum on the r.h.s. of ([7]) goes to and Ln"{p^) 
converges to L{p^) in view of ^ and It follows that 

L{p^)<L{p* +ei). (8) 

The sequence of functions log(p* + £;) is monotonously non-increasing in I with pointwise limit 
logp*, and is bounded above by the integrable function log(p* -I- ei). Using the theorem of 
monotone convergence, we conclude from ([5]) that L{p^) < L{p*). Hence, p* — p^ by Part (b) 
of Lemma [T] 

(b) Follows analogously as Part (a) with p^ replaced by pg. 

(c) As in the proof of Part (a), we may restrict ourselves to the case 1/2 < s < t. Define 
C,'^ = infjixe ^(a^: By hypothesis, > 0, and V{t, ('^ , D) is non-empty as it contains Vq. 
We may now apply Part (d2) of Proposition 15(11 in Appendix IB] with F — V{t,C/^ , D) to get 

lim sup \Lk{0,p) - L{e,p)\^Q ^-a.s. (9) 
exv(t,c,* .D) 

Let £i be as in the proof of Part (a). For each / e N, Part (d2) of Proposition [30l in Appendix iBl 
with T — {p + El : p £ V{t,(, D)} implies that 

lim sup sup \Lk{0,p + El) — L{9,p + ei)\ ~ /x-a.s. (10) 
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In the following arguments we fix an arbitrary element of the probability 1 event where ^ and 
PI7| hold. Assume that supggg, ||i5fe(6') —pg\\s,2 does not converge to 0. Then there is some 77 > 
such that for every k E N there are k' G N, k' > k, and 9k' € O that satisfy 

\\Pk'iOk')-pe,,h.2>V- (11) 

By compactness of 8 and compactness of Vlt, C, D) as a subset of W|(f2), we find a subsequence 
Pk"{Ok") of Pk'{Ok') such that 6k" converges to 9* for some 9* g 6, and \\pk"i9k") — P*\\s.2 
converges to for some p* G V{t,(,D). So, if p* equals pe' (which we verify below), then 
\\Pk"{9k") — Pe*\\s,2 converges to 0. Consequently, \\pk"{9k") — P0^„\\s,2 converges to because 
pg^„ converges to pg' in {'P{t,(, D), \\ ■ \\s.2) in view of Proposition [25] in Appendix \X\ and Re- 
mark 3) This is in contradiction to and therefore in contradiction to the assumption that 
supggQ \\Pk{9) — Pells, 2 does not converge to 0. 

It remains to show that p* equals pg* . Use Assumption IP.ll the definition oi pk"{9k") as 
maximizer, and the monotonicity of the logarithm to obtain 

Lk" {9k" , pg,„ ) < Lk" {9k" , Pk" {9k")) < Lk" {9k" , Pk" {9k" ) + £;) 
< L{9k",Pk"{9k")+ei) 

+ sup sup \Lk"{9,p + ei)- L{9,p + ei)\. (12) 

The first term on the r.h.s. of (jl2p converges to L{9* ,p* +ei) since 9 k" converges to 9* , \\pk" {9 k") — 
p*\\s.2, and hence also \\pk"{9k")—p*\\n^ converges to 0, and •+£;) is a continuous function on 
ex (7'(i, C, £>), II -llo) by Part (c2) of PropositionlSOlin Appendix[Bl Recall that the supremum on 
the r.h.s. of ([T^ goes to in view of pUj) . Further, the supremum on the r.h.s. of the inequality 

\Lk"{9k",pg,„)~L{9*,pg*)\ 

< sup \Lk"{9,p)-L{9,p)\ + \L{9k",pg^„)-L{9*,pg.)\ 

converges to by The second term on the r.h.s. goes to as 9k" converges to 0*, \\pe^,, — 
P9*||s,2, and hence also \\pe^,, — pg'\\n, converges to 0, and L{9,p) is a continuous function on 
e X {'P{t,C*,D), II • \\n) by Part (c2) of Proposition EOl in Appendix[Bl Hence, the l.h.s. of ^ 
goes to L{9* ,pg'). It follows that 

L{9\pg*)<L{9\p* +ei). (13) 

The sequence of functions log {p* -\-£i){p{-, 9*)) is monotonously non-increasing in / with pointwise 
limit log p*{p{-,9*)), and is bounded above by the integrable function \og{p*+ei){p{-,9*)). Using 
the theorem of monotone convergence and (|13p . we conclude that L{9*,pg*) < L{9*,p*). Hence, 
P* = pg* by Part (c) of Lemma [T] ■ 

Remark 9 For later use we note the following: (i) Let Assumption ID . 1 1 be satisfied, and suppose 
X > satisfies inixenPA{x) > X- It follows from Part (a) of Theorem [5] that there are events 
An G B{ny^ that have P"-probability tending to 1 as n 00 on which mtxenPn{3:) > X holds. 

(ii) Let pg e V{tX,D) for a given 9 € Q he satisfied, and suppose x{&) > satisfies 
mix^Qp{x,9) > x(fi') for the given 9. It follows from Part (b) of Theorem [5] that for the given 
9 there are events Bk{9) £ that have /i'"'-probability tending to 1 as fc — > 00 on which 
\nix^nPk{9){x) > x{d) holds. 

(iii) Let Assumptions IP. 1 1 and IR. 1 I be satisfied, and suppose x > satisfies infoxep(a;, 9) > x- 
It follows from Part (c) of Theorem [5] that there are events Bk £ V'^ that have /i*''-probability 
tending to 1 as fc ^ cx) on which infgge inf^eo Pfc(0)(x) > x holds. 
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4.2 Rates of Convergence for NPML-Estimators 



Following ideas of van de Geer (1993), Nickl (2007, Proposition 6) obtained convergence rates 
for the NPML-estimator pn in various Sobolev-norms as 



for every < s < t, provided Assumption ID.ll and C > hold. Modulo measure-theoretic 
nuisances, this immediately gives an analogous result for \\pk{()) — Pe\\s,2 for each e 6. [The 
complication here is that the result in Nickl (2007) is proved for data generating processes defined 
as coordinate projections on a product space, which is not the case for Xi{6); cf. the proof of 
Part (b) of the subsequent proposition.] In Section|4?3]below, however, we shall need convergence 
rates for supg^Q \\pk{0) ~P9\\s,2, i-e., convergence rates that hold uniformly w.r.t. 9 (z Q. Before 
we turn to these uniform results, we provide an extension of Nickl's (2007) rate result in that we 
avoid the restriction C > 0. Note that Assumption ID.2I alreadv follows from Assumption ID. II in 
case C > 0. 

Proposition 10 (a) Under Assumvtions \D.l\ and \D.B we have \\pn—Pk\\s,2 = Op(n^^'^'*-'^^^*^"'^-') 
for every < s < t. (b) If pe € V{t,C,,D) and vaix£np{x^9) > hold for a given 9^0, then 
WPkiO) -Pe\\s,2 = 0^(fc~(*"'*)/(^*+^)) for every < s < t and the given 9. 

Proof, (a) Measurability of \\pn — Pa lis, 2 is established in Proposition [Ml in Appendix iDl The 
result is trivial in case s = t since P{tX,D) is a bounded subset of W2(fi). Hence assume 
s < t. If C > 0, the result follows from Proposition 6 in Nickl (2007). Now suppose C = 0. By 
Assumption ID.21 we can then choose x > = ^ such that mfxefiPkix) > X holds. By Remark 
IHKi) we have that p„ S 'P{t,x,D) on events An £ B{ft)" that have probability tending to 1 as 
n — > oo. Since 'P{t,x,D) C 'P{tX,D), the NPML-estimator p„ over V{t,(^,D) coincides with 
the NPML-estimator over the smaller set V{t,x,D) on these events, and the latter estimator 
satisfies (HH) by Proposition 6 in Nickl (2007). 

(b) In view of (|4]) and since (xi, . . . , Xk) i— ^ Pk{'', a^i, . . . , Xk) is a measurable mapping from 
ri*^ into {Vit, D),\\ ' lls^); '^f- Theorem [SI Pk{S) has the same law as pk{-; Zi, . . . , Zk), where 
(Zi, . . . , Zk) has the same distribution as {Xi{9), . . . ,Xk{9)) but the Zi are given by the coor- 
dinate projections on (17^, S(51)^). Since || • |jn and || • \\s.2 for s < t generate the same Borel 
(T-field on VitXjD) (cf. Lemma [551 in Appendix [D]), ||pfc(6') -— pe\\s,2 is measurable and has the 
same distribution as \\pki'', Zi, . . . , Zk) — Pe\\s,2- Now apply the already established Part (a) to 
Pk{-;Zi,...,Zk). m 

In case s — t, in fact ||p„ — p^||s^2 < 2Z3 and \\pki9) —P9\\s,2 < 21? hold under the assumptions 
of the above proposition. The next proposition is instrumental in proving the uniform-in-0 
convergence rate result. 

Proposition 11 Let T be a (non-empty) bounded subset of W|(f7) with s > 1/2. Suppose 
Assumvtion lR.^ holds. 

(a) Then the C^{ij) -bracketing metric entropy of 



\Pn - 



P.ll.,2-o;(n-(*-^)/(2*+i)) 



(14) 



:F* = {f{p{-,9)):9ee,feJ'} 



satisfies 



H[]{e,J^*,\\-h.^^)<e 



-l/s 



(15) 



In particular, is ji-Donsker. 
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(b) Suppose the elements of J- are bounded below by some x > 0. Then the C^{^) -bracketing 
metric entropy of 

logT* = {log f{p{; 9)) : e e e, f e T} 

satisfies 

i^[](e,log^M|•||2,p)<e-'/^ 



We note that in the subsequent uniform-in-0 convergence rate resuh Assumption 
follows from Assumption IP. II in case C > 0. 



already 



(16) 



Theorem 12 Let Assumvtions \P.l[ IP.H and \R.S\ be satisfied. Then 

sup \\pk{e) - pe\\s.2 - 0^(fc-(*-^)/(2t+i)) ask^oo 
eee 

for every < s < t. [In case s — t, the above supremum is bounded by 2D.] 

Proof. Measurability of supggQ \\Pk{9) — pe||s,2 for < s < t is established in Proposition l36l 
in Appendix ID] The claim in parentheses follows since Pk{d) € V{tX,D) by construction and 
Pe G V{t,(,D) by Assumption IP. II We now distinguish two cases: 

Case 1: Assume first that C > and s — 0. We then verify the conditions of Theorem [551 
in Appendix E with (A,^^') = iV^\V^,^l^), S ^ 0, T = V{tX.D), d{p,q) = \\p - qh, 
Hk{cr,T) = Lk{0,p), H{a,T) — L{9,p), Tk{(j) —Pk{d), and T(cr) =pe- Condition ((39|) is satisfied 
by definition of the NPML-estimators pk (0) . Condition ((37)) follows from the second-order Taylor 
expansion of L{0, •) around the density pg: using Proposition l31l in Appendix [B] we obtain 

L{e,p)~L{9,pg) = BL{9,pg){p-pg) + ^B^L{e,p){p-pe,p-p9) 



PedX < --C(CtD) Wp-peh, 



where p is some density on the line segment joining p and pe; note that p G V{t,C,,D) by 
convexity of this set, and hence satisfies ||p||n < CtD. This proves condition ([57)) in Theorem 1551 
with C — 2~^C {CtD)~ and a = 2, both constants being independent of 9 and p. 
Next we verify condition (|55)) : set 

Gs = {\ogp{p(-,9)) - \ogpe{p{-,9)) ■.9eQ,pe V{t, (, D), \\p - peh < S} 

for S > 0, which is clearly non-empty. Then clearly 



E sup sup 

see p(=v{t,c,D), 

\\P-P0 \\2<S 



Vk(Lk - L){9,p) - Vk{Lk - L){9,pe) 



E 



where E* denotes the outer expectation. Since we have temporarily assumed C > Oi the logarithm 
is Lipschitz on [C,oo) with Lipschitz constant C~^- This implies that Qs is bounded by B := 
2C,~^CtD in the sup-norm and by rj{5) := (^^C^^'^ D^^^S in the £^(/^)-norm. Consequently, 



E 



B 



ri(5)^\/k 



by Theorem [551 in Appendix |E| Since 



Gs C {logp{p{;9))-\ogpe{pi;9)):9ee,peVitX,D)} 

c {logp{p{;9)) ■.9ee,pe nt, C, D)} - {logp(p(., 9)):0ee,pe r{t, C, D)}, 
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we have that 



N[]{e, Gs, II • h,,) < iV[ ](e/2, {\ogp{p{;d)) : 9 e Q, p £ Vit, C, D)} , \\ ■ lU,^)^. 
Applying Proposition fTTTb) with s — t and F = V{t^ D) we get from this inequahty 



I[M5).Q5,\\-\\2.^.) 



< 



V 1 + £ ^/*de < niax(r/((5), / e 
{6,5^-'/^'). 



Hence there is some constant L, < L < oo, such that 



max((5, S 



1-1/2U 



holds for all S > 0. Write (pi;{S) for the r.h.s. of the last display and note that 6 i-^ S^'^Lpi^{S) is 
non- increasing ioi /3 — 1. This establishes condition ([55]) in Theorem [551 

Condition (PH]) in that theorem is satisfied for a — 2 and rfc = ;jt/(2t+i) This gives the desired 
rate and completes the proof in case C > and s = 0. Now suppose C > but < s < <. Recall 
that supggQ ||pfc(6') — fell*, 2 < 2D. The result then follows from the interpolation inequality 

ii/iu,2<c.,,ii/ii?/2 ii/iir^^/* 

for / S W|(f2), where Cs,t > 0; see Theorem 1.9.6 and Remark 1.9.1 in Lions and Magenes 
(1972). 

Case 2: Suppose now C = and < s < t. In view of Assumption IP.2I we may choose x > 
such that iniQyQplxjO) > x- Then, by Remark [SKiii) , there are events that have probability 
tending to 1 on which infgge infxenPfc(^)(a;) > x holds true. Since V{t,x,D) C V{t,(,D), 
we have that on these events Pk{0) coincides with the NPML-estimators over the smaller set 
V{t,x, D). The result now follows from what has already been established in Case 1 since 
Assumption IP. II fand IP.2|) is also satisfied with respect to 7^(t,x,-D). ■ 



4.3 Donsker-type Theorems for NPML-Estimators 

Nickl (2007) established Part (a) of the following Donsker-type result under the additional as- 
sumption that C > holds. Part (b) is (modulo measure-theoretic nuisances) a simple conse- 
quence of Part (a) . 

Theorem 13 Let T he a non-empty hounded subset o/W2(ri) for some s > 1/2. 



(a) Suppose Assumvtion \D.3\ is satisfied. Then, for all real j > 1/2, 



sup 



{Pn-pA)fdX-V^iFn- 



or{n 



^(min(s,t)-j)/(2t+l) 



(17) 



as n oo; in particular, the l.h.s. of the ahove display is or{^) as n oo. Consequently, the 
stochastic process f ^/n J^ipn —Pk)fd\ converges weakly to a ¥-Brownian hridge in 1°°{J-). 

(h) Supposepe e V{tX,D), infxenp{x,0) > C. a"'^ ||P6i|k2 < D hold for a given G 8. Then, 
for the given 9, a result analogous to Part (a) holds for the process f i— >■ Vk J^{pk{0) —pe)fd\ 
with Pfc and P, respectively, replaced hy f'e.k o,nd Vg, where Pg^k is the empirical measure of 
Xi{9), . . . ,Xk{0) and Pe is the prohahility measure corresponding to pg. 
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Proof, (a) Measurability of the l.h.s. of P7|) follows from Proposition [571 in Appendix ID] For 
C > the result follows immediately from Theorem 3 in Nickl (2007). Now suppose C = 0- 
view of Assumption |D2] we may choose x > such that mfx^nPAix) > X- Then, by Remarkinji), 
there are events that have probability tending to 1 on which infa;gsip„(a;) > x holds true. Since 
V{t, X, D) C 'P{t, D) = 'P{t, 0, D), we have that on these events pn coincides with the NPML- 
estimators over the smaller set V{t,x,D). Since x > and since Assumption ID . 3 1 is also satisfied 
relative to V{t,x, D), the result now follows from what has already been established. 

(b) Note that X/c(x, /) and supjgjr \Xk{x, /) — 2)fc(i, /)| defined in Proposition \T7\ a.) in Ap- 
pendix |D] are Borel measurable on fj'^ . Consequently, 

sup \XkiX,i9), . . .,Xki9)J) - 2}fc(Xi(0), . . .,Xki9), f)\ 

and 

sup \Xk{Zi,...,Zk,f) -2}fc(Zi,...,ZA;,/)| 

have the same distribution, where the Zi are as in the proof of Proposition 1101 Furthermore, 
it follows that the finite-dimensional distributions of the processes / i~> Xk {Xi (6), . . . , Xk (0) , /) 
and / i-> Xfe(Zi, . . . , Zfc, /) coincide. It is easy to see that the maps / — > Xk{x,f) belong to 
C''(^, II • II o), the space of bounded uniformly continuous functions on (J", || • ||o). Consequently, 
Xk{x, ■) is Borel measurable as a random element in C^{F, \\ ■ ||si), since the Borel cr-field on this 
space is generated by the point-evaluations (observe that (J^, || • ||o) is totally bounded in view 
of Lemma IMI in Appendix IU|) . Since C°(J-", || • ||n) is Polish by total boundedness of (J^, || • \\n), 
the entire laws of the processes / i— )■ Xfc(A"i(0), . . . , A"fc(0), /) and / i-^ Xfc(Zi, . . . , Z^, /) on 
C"(J^, II • \\n), and hence on £°°(J^), coincide. In view of Q, Part (b) now follows from applying 
the already established Part (a) to Pk{'\ ^i, • • ■ , Zk). ■ 

The next theorem shows that a weak limit theorem for the stochastic process {9, /) i-> 
^fk J^{pk{0) — po)fdX can be obtained even in the space x J^). A corollary of this is 

then a uniform-in-0 version of Part (b) of the above theorem. The proof of this theorem largely 
follows the ideas in Nickl (2007): Loosely speaking, a mean-value expansion oi 'DLk{0,pk{9)){-), 
analogous to the one in the classical parametric case, shows that this can be represented as the 
sum of the score evaluated at the true density pg, i.e., TiLkiO ,pe){ ), plus a second derivative 
term applied to the estimation error {pk{9) — Per)- [For given 6* e 8, the Frechet-derivative of 
Lfc with respect to the second argument is here denoted by DLfe(0, •).] The score, evaluated at 
the true density pe and properly scaled, turns out to be an empirical process having a Gaussian 
limit. The second derivative term turns out to coincide with — J^{pk{0)—pe)fdX up to negligible 
terms. [An important ingredient for establishing negligibility are the uniform-in-0 convergence 
rates for pk{0) in different Sobolev norms that have been established in the previous section.] 
Apart from a series of technical difficulties not present in the classical parametric case, the major 
difficulty is then the following: in the classical parametric case the usual assumption that the 
true parameter belongs to the interior of the parameter space together with consistency implies 
that the estimator is eventually an interior point, implying that the score evaluated at the max- 
imizer is zero. In the present case, while pg is an interior point of V{t, D) relative to Ht as a 
consequence of the assumptions underlying Theorem II 5[ the estimator pk{0) is, however, not an 
interior point of the domain 'P{t, C, D) (relative to Ht) over which optimization is performed, as 
shown in Theorem [S] in particular, pk{0) is not consistent w.r.t. the ||-||j 2"iiorm. As a conse- 
quence, one can not conclude that the score evaluated at the maximizer is zero. [Trying to save 
this argument directly by using an \\-\\^ 2"i^orm with s < t does not work either: while pk{0) is 
consistent in the ||-||^ j'^io^nij Po is then not an interior point of Vit, ^, D) relative to H^.] Hence, 
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a different reasoning is needed to show that 'DLk{0,pk{0)){-), although not necessarily zero, is of 
sufficiently small order. This is provided in the subsequent lemma, which is essentially a uniform 
version of Lemma 4 in Nickl (2007). The proof as given below makes use of Proposition [3] which 
allows us to simplify the arguments given in Nickl (2007). In the following lemma let Hj denote 
the linear subspace of W2(ri) that is parallel to Hf . 



Lemma 14 Suppose Assumvtions \P73\ and \R.S\ are satisfied and C > holds. Let Q be a non- 
empty bounded subset of C W|(f2). Then 

sup sup \T>Lkie,pkm{9)\ = o^(fc-(*-^')/('*+i'-i/2) (18) 
eee geg 

for every real j > 1/2. 

Proof. Measurability of the l.h.s. of follows from PropositionlHTTc) in Appendix [D] W.l.o.g. 

we may assume 1/2 < j < t. By Assumption IP.3I and Proposition [3Kb) we can find S > small 
enough such that 

P0 + W eV{tX,D) 

holds for every 6 £ Q and every w G Ut^s H Hj. Note that 6 does not depend on 6. Since Pk{0) 
maximizes Lk{0,-) (which is differentiable in view of Proposition [21] as C > is assumed) over 
V{t, (, D) we conclude that 

T>Lk{e,pu{e)){pe+w-pk{e))<Q 
holds for all e 8 and all it; € Wt.s n H". This implies 

T>Lk{e,pk{e)){w) < T>Lkie,pk{e)){pk{e)-P9) 

for all G 6 and w E Ut^s H Hj. Since Ut^s H Hj is invariant under multiplication by —1, we 
obtain 

sup sup \DLk{e,pk{e)){w)\ < snp\BLk{e,pk{e)){pk{e)-pe)\ 
eee weUt,snH° eee 

< sup \CDLk{e,pk{e)) - T>L{e,pk{e))){pk{e) - pe)\ 

eee 

+ sup \{-DLie,pki9)) - BL{9,p0))ipkie)-pe)\ 
eee 

< sup||pfc(0)-pe|L-2 sup WDLkie,p)~-DLie,p)\\^ 

eee exvitx.D) 

sup WPkiO) -pe\\\ ' 

where we have repeatedly used Proposition[2Il in particular to establish that 'DL{9,pg)){pk{9) — 
pg) = 0. Now use Theorem IT^ and Proposition [21] with a = 1 and Hi = Uj^i to conclude that 
the r.h.s. of the last display is 

since j > 1/2. A fortiori this holds for all j > 1/2 and thus proves the result for the case where 
Q is contained in Utj n . Since is homogenous w.r.t. scaling of G and since S does not 
depend on the just mentioned inclusion can, however, always be achieved by rescaling. ■ 

We note that the lemma can easily be extended to the case C = by making use of Remark 
[njiii). The main result is now the following. 
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Theorem 15 Suppose Assumptions [7731 and \R.A are satisfied. Let J- be a non-empty hounded 
subset of^N^i^) for some s > 1/2. Then: 
(a) For all real j > 1/2, 



sup sup 

see /eJF 



Vk / {pk{e) - pe)fdx- Vk{fi^~ ^,)f{p{■,0)) 



-(min(s,t)-i)/(2t+l) 



) (19) 



as k oo; in particular, the l.h.s. of the above display is 0^(1) as k ^ oo. 

(b ) There exists a zero-mean Gaussian process G indexed by Q xj- with bounded sample paths 
such that the stochastic process {0,f) > Vk J^^{pk{(^) ^ Pe)fd\ converges weakly to (G(0,f) in 
X J-). The process G is measurable as a mapping with values in £°°{Q x T), has separable 
range, and has sample paths that are uniformly continuous with respect to the pseudo-metric 
d{{9,f), {9',g)) — (Var[G{9, f) — . Its covariance function is given by 

Cov[Gie, /), G(0', g)] = ^ (^fipi;e)) ~ f{p{-,e))dp^ [g{p{; 9')) - g{p{; 9'))df?j dp. 

(c) 



sup sup 



/ ipk{9)-pe)fd\ 



0^(1) as k oo. 



Proof. Part (a): Measurability of the l.h.s. of (IT9|) follows from Proposition ISTT b') in Appendix 

m 

Step 1: Wc first consider the case C > 0. Let Q he a non-empty bounded subset of H°. 
Applying the pathwise mean- value theorem to the function TiL^if), ■){g), adding and subtracting 
a term, and using Proposition [31] leads to 

TDLki9,pk{9))ig) = BLki9,pg){g) +-D^Lk{9,pki9)){pk{9) ~ pe,g) 

= {p, - p) {pg\j){p{;9)) + -D^L{9,pg){pk{9) - pe,g) 
+ [D'Lki9,pki9)) - B^Li9,pg)] (pkiO) - Pe,g), 

where Pk{9) — £.Pk{d) + (1 — £,)Po for some ^ S (0, 1); note that Pki&) G V{t,(,D) by convexity. 
In the above display we have also made use of the fact that p{pg^g){p{-,9)) — since g G Hj. 
Again adding and subtracting a term and using Proposition 1311 this leads to 



BLk{9,pk{9)){g) = {l^k-l^)iPe'9)iPi-:0))- Pe\Pk{0)-Pe)gdX 

Jn 

+ [D^Lk{9,pk{9)) - B'L{9,pk{9))] {pk{9)-pe,g) 

+ I pf{0)p,\pm-pl){pk{9)-pe)gdX. 
Jn 

Consequently, for every real j with 1/2 < j < i we obtain 



sup sup 

9e0geQ 



Pe ^{Pk{0) - Pe)gdX- {p^ - p) {pg ^g){p{-,9)) 



< supsup|Difc(^,pfe(0))(.g)| + 

9e0geQ 

sup sup I [D^Lki9,pk{9)) - B^L{9,pk{9))] ipk{9) ~ pe, g)\ 

9e0geg 



P^WPeiPm - Pi)iPk{0) - Pe)gdX 



■ sup sup 
fee gee 

+ 11 + III, 



(20) 
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where / = o^(fc 1/2^ i-,y Lemma [HI We next bound expressions // and ///: 

Clearly, 



11 < sup \\pk{e)-pg\\,.2 sup \\D^L{e,p)--D^Lk{e,p)\ 

eee exvit,c,D) 



The first supremum in the above display is Op(fc 



by Theorem [T^l Since is 



bounded in W2(ri) and hence also in W2(ri) as j < t (cf. Proposition [T]), and since Uj,i is clearly 
bounded in W^i^l), the second supremum in the above display is 0^(fc~^/^) by Proposition 15^ 
when applied with a — 2, Hi = Uj^i, and 7^2 = This shows that the expression // is 
Of,{k-^*-^y^^*+^'>-^/^) for every real j with 1/2 < j <t. 

Next, observe that \pk{d) - pe\ = £,\Pk{0) - Pe\ < \Pk{0) - Pe\ and that pk{9) > C: Pe > C as 
these functions belong to V{t, (, D). Hence 



III < 2C CtDG sup \\pki0) - 
eee 



-norm bound for Q. (Here we have repeatedly used Proposition [Tfb)). 



where G < 00 is a | 

TheoremfT^then shows that expression /// is Op(fc' 



-2t/(2t+l) 



-(t-j)/(2t+l)-l/2 



that the l.h.s. of ^ is C>*(fc- 
fortiori for every real j > 1/2. Consequently, 



) . Putting things together we obtain 
) for every real j with 1/2 < j < t, and hence a 



sup sup Vk 



= ol(k 



-(*-j)/(2i+l)^ 



(21) 



for every real j > 1/2. 

Let now be a nonempty bounded subset of W|(f2) and let _B < 00 denote a ||-|jj 2-norm 
bound for J^. Define tt £)/(/) = if ^ In fP9'dX)pe' for any / e W|(fi) and 9' € Q. Then, using 
Propositionllja) and the fact that pg> £ V{t, (, D) by Assumption IP.3[ gives 



sup sup \\n0>{f)\\t^2 
e'eefeJ^ 



< 



Mt sup sup 
e'ee feJ^ 





/ - / fP9'd\ 


\\P 


e'\\t,2 




Jn 


t,2 





i3 + sup||/||^||l||,2 



< MtD 

< MtDB{l + CtX{n)^/^) <oo 



(22) 



This shows that the set 

g{e,T)^{7re'if): f eT,0' ee} 

is a nonempty bounded subset of W^ri). In fact, it is a subset of by definition of irg'. It is 
now easy to see that applying ((2T|) to g{Q,J-) implies ((T9)) in the case s — t. The case s > t 
immediately follows, since every nonempty bounded subset of W2(r2) with s > t can also be 
viewed as a nonempty bounded subset of VJi^i^l) by Proposition [Ijc). This proves Part (a) in 
case C > and s > t. 

Step 2: We now consider the case where C > and 1/2 < s < t. For every f G T let 
Uk{f) G VV|(ri) be the approximators defined in the proof of Proposition 1 in Nickl (2007). They 
have the following properties: 



sup \\uk{f)\\t,2 



t-s)/(2t+l) 



) as A; — >■ 00, 



(23) 
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where supygjr ||w/c(/)||t,2 is finite for every fc e N; and, for every r, < r < s, 

sup 11/ - ukimr.2 = 0(fc-(^-'^'/(2*+i)) as fc ^ oo. 
feJ' 

We have that 



sup sup 

see feT 

< sup sup 



' sup sup 



Vfc / {pm-Pe)fdX-Vk.{pik-^Af{p{;0)) 
Jn 

f {pk{9) - pg){f - Uk{f))dX 

Vk{^,,-^,){f{pi■,0))^u,if)ip{■,9))) 



- sup sup 



Vfc / iPkiO) ~ pe)ukif)d\ ~ Vkifi, - p)ukif)ipi;9)) 
IV + V + VI 



We now derive bounds for each of the above expressions: 

Using ([M]) with r = 0, the Cauchy-Schwarz inequahty, and Theorem [T^] we obtain 

IV<Vk sup 11/ - uk{f)h sup WPkiO) - peh = 0^(fc-(^-i/2)/(2*+i)). 
feT see 

Next, choose an arbitrary real j such that 1/2 < j < s and observe that 



V = sup sup 
eee feJ" 



Vk{f,,-p){f-uk{mp{;9)) 



(24) 



(25) 



< 



sup sup 



y^it^k- f^)Hp{-,^)) sup II/- Ufc(/)|lj-2 



= \\^{^^k - P)\\U' sup ||/-Mfc(/)||j,2, 



where 



Since j > 1/2, the class of functions U* i is /x-Donsker by Proposition [TIJa) , hence 



Vk{Pk ~ m) 



0^1) 



(26) 



in view of Prohorov's theorem, measurability following from Proposition [ST] Making use of ([24|. 
it follows that the r.h.s. of (^51) . and hence Expression V, is 0^(fc~(^~-'^/(^*+^^). 
Finally note that Expression VI is bounded by 



sup sup 

See heUt,i 



Vfc / {pk{0)~pe)hdX^Vk{p^- fi)h{p{-,d)) 



sup ||ufe(/)||t,2. 



Since Ut^i is a nonempty bounded subset of W2(ri) and since Part (a) has already been estab- 
lished in Step 1 for such sets of functions, the first term on the r.h.s. of the last display is 
Of,[k-^*'-3'^/^'^*+^y), and using we conclude that 
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The above bounds imply that the Lh.s. of (gS]) is Op(A:-(''"-')/(2*+i)) for all 1/2 < j < s, and 
hence is Ofj,{k~^^~^^^^'^*~^^'>) for all j > 1/2. This completes the proof of Part (a) of the theorem 
in case C > 0. 

Step 3: We next consider the case C = 0- In view of Assumption[R3]we may choose x > such 
that infnxep(3^i ^) > X- Then, by Remark [2)Iiii) , there are events that have probability tending 
to 1 on which infgge ''^^ix£nPk{0){x) > x holds true. Since V{t,x,D) C 'P{t,0,D) — V{tX,D), 
we have that on these events Pk{d) coincides with the NPML-estimators over the smaller set 
V{t,x,D). Part (a) in case C = now follows from what has already been established in the 
preceding two steps (applied to the NPML-estimator based on V{t,x,D) instead of C, Z?) 
and noting that Assumption IP.3I is also satisfied relative to V{t,x, D)). 

Part (b): In view of Part (a) it is sufficient to show that (6*,/) iH- \/k{^f. — fi)f{p{-,9)) 
converges weakly in i°°{Q x to G{9, /). To this end, let 

for every if € 61 G 9, and / e T, where T* = {f{p{-,9)) : e 9, / G J"}. Note that the 

resulting mapping H : £°°{J'*) — > £°°{Q x is continuous since H is linear and 

||i?Mllex^ = sup sup Mf{p{; em = ll^lb. 

for all ip € i.°°{F*). In fact, H is an isometry. Since F* is /x-Donsker by Proposition ITlT a) . 
Vk{fJ.k ~ converges weakly in £°°{T*) to a /z-Brownian bridge G*, that is, G* is a mean-zero 
Gaussian process indexed by J^* , which is measurable as a mapping with values in £°°{J^*), has 
covariance function 

Cov[G*{f{p{-,e))),G*{g{p{-,e'm 

fipi;9)) - 1^ fip{;9))dp}j (^g{pi;e')) - g{p{- ,e'))d,^ dp, 

and has sample paths that are uniformly continuous with respect to the pseudo-metric 

d*{f{p{-,9)),g{p{; 9'))) = (Var[G*(/(p( • , 9))) - G*(.g(p( • , 9')))])"^ . 

Since the empirical process \/k{p^ — p) indexed by is mapped into the process (0, /) i— > 
Vk{pi^ — p)f{p{-,9)) by the map H, the continuous mapping theorem shows that the latter 
process converges weakly in £°°{Q x J^) to G :— H{G*). The properties of G claimed in the 
theorem follow easily from the corresponding properties of the /z-Brownian bridge G* and the 
fact that H is an isometry. 

Part (c): Follows directly from Part (b) in view of Prohorov's theorem, with measurability 
again following from Proposition I37r b) in Appendix |D] ■ 

We next obtain a corollary showing that \/k ^Q^{pk{e) — pe){-)d\ converges in £°°{F) to G(0) 
uniformly over 9, where G{9){f) :— G{9,f) for all f E J^. For this we recall the following 
definitions: Let (5, d) be a metric space. For probability spaces (Ai,^i,Fi), {A2,A2,P2) and 
mappings Yi : Ai ^ S , Y2 : A2 ^ S such that Y2 is A2-B{S, (i)-measurable and has separable 
range define an analogue of the dual bounded Lipschitz metric by 



^{S,d}iYuY2) = sup 



/ h{Y,)dPi - f h{Y2)dP2 



mBL{S,d) < 1 
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where J* denotes the outer integral and || • ||_BL(s,d) denotes the bounded Lipschitz norm; cf. the 
definition on p. 115 in Dudley (1999). By Theorem 3.6.4 in Dudley (1999), Y (where Y is 

measurable and has separable range) if and only if 

hm /3(s,,)(r„,r) = o. 

Corollary 16 Let the hypotheses of Theorem\l^ be satisfied. Then, for every G O, G{9) — 
G{9, ■) is a measurable mapping with values in £°°{J-) that has separable range. Furthermore, 

lim sup/?,..(^)(Vfc / ipk{d)-Pe){-)dX,G{e){-))=0. 

[In fact, <G{9) is a Pg-Brownian bridge where Pq denotes the probability measure corresponding 
to pe.] 

Proof. Let 6* e e be fixed, and define Hg{ip){f) ^ ip{e, f) for every ip e £°°(e x F) and / G J". 
This gives a Lipschitz mapping Hq : t°°(Q x F) ^ £°°{JF) whose Lipschitz constant is 1 and 
hence is independent of 6. Clearly, G{9) = He{G) holds. Since G is a measurable mapping with 
separable range in £°°{Q x F) by Part (b) of Theorem [121 this shows that, for every 9 G Q, 
G{9) is measurable with separable range in Further, since the composition of Lipschitz 

mappings with Lipschitz constant at most 1 is again Lipschitz with Lipschitz constant at most 
1, it follows that 

sup/3,o.(^)(yfc / {pm~Pe){-)d\MO){-)) 

= sup/3,o.(^)(i?e(Vfc / (pfc(.)-p.)(-)rfA),i/e(G(.)(-))) 

See Jo 

< /3,.o(ex^)(^ / {Pk{')-p,){-)d\M'){-))- 

The r.h.s., and therefore the l.h.s., of the previous display converges to by Part (b) of Theo- 
rem [15] That G{9) is in fact a Pg-Brownian bridge indexed by T easily follows from Part (b) of 
Theorem [15] and the transformation theorem. ■ 

The statement in Corollary [T^] is in fact independent of any distance describing the concept 
of weak convergence in £°°(J^), see Remark 18 in Gach and Potscher (2010) for more discussion. 

Remark 17 We have assumed that the processes (Xi) and (Vi) are canonically defined, i.e., are 
given by the respective coordinate projections of the measurable space (fi^ x T^^,i3(ri)^ (gj V^). 
We have made this assumption to be able to freely use results from empirical process theory as 
well as from Nickl (2007) which typically are formulated in this canonical setting. However, the 
measurability results in Appendix [D] show that all results of the paper continue to hold if (Xi) 
and (Vi) are defined on an arbitrary probability space. 



5 Simulation-Based Minimum Distance Estimators 

We next study simulation-based minimum distance (indirect inference) estimators when the 
auxiliary density estimators are the NPML-estimators p„ and Pk{9) based on the given auxiliary 
model ■p(<, C, D). To this end we define for every 9 E Q 

Q k{9) = /^(^" " Pk{Q)fPn^d\ if pn{x) > for all x E n, ^^7^ 
I otherwise, 
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and 



Qn{0) 



IniPn - Pe fPn ^dX if Pn{x) > for all x e ft, 
otherwise. 



Note that Qn,k as well as Q„ take their values in [0, oo]. By separability of ft and continuity 
of pn, the set {pn{x) > for all x & V,} belongs to the cr-field B{fl)^. Since p„ and Pk{d), 
respectively, are jointly measurable by Reniark[6Ui) , it follows from Tonelli's theorem that Qn,k{0) 
is (g) V'^-measurable and that Q„(0) is ;B(r2)"-measurable for every 9 £ Q. [Assigning the 

value on the complement of {p„(x) > for all x € fi} to both objective functions is arbitrary 
and irrelevant for the asymptotic considerations to follow.] 

A simulation-based minimum distance (SMD) estimator is now a mapping 9n^k ■ x V'^ ^ © 
that minimizes Qn,k over Q whenever the minimum exists (and is defined arbitrarily otherwise). 
Similarly, a minimum distance (MD) estimator is a mapping 0„ : fi" — > Q that minimizes Q„ 
over G whenever the minimum exists (and is defined arbitrarily otherwise). The MD-estimator 
is of course only feasible if a closed form expression for pg can be found; here it serves as an 
auxiliary device for proving asymptotic results for the SMD-estimator. 

Furthermore, whenever Assumption ID . 2l is satisfied, we define 



which takes its values in [0, oo]. In view of convergence of p„ to p^ and of Pk{0) to pg (under the 
assumptions of Theorem[5]), Q can be viewed as the limiting counterpart of both Q„.fc as well as 



5.1 Consistency of SMD-Estimators 

Before turning to consistency, we show that MD- and SMD-estimators in fact minimize their 
corresponding objective function at least on events that have probability tending to 1. Note that 
in the following proposition the statement of Part (c) is stronger than the one of Part (b), but 
also requires additional assumptions. 

Proposition 18 Let Assumvtion \R.l\ be satisfied. 

(a) Suppose C > holds. Then any SMD-estimator 9n,k minimizes Qn^k for every 
(xi, . . . , Xn, Vi, . . . , Vk) G i^" X V'' . Furthermore, there exists an SMD-estimator that is B{ft)" (E) 

-B{Q) -measurable. 

(b) Suppose C = and Assumvtions \D71\ and \D.!^ hold. Then there are events An G 3(0,)" 
having probability converging to 1 as n oo such that, on the events An x V'^ and for every 
k (zN, any SMD-estimator 9n,k minimizes Qn,k. 

(c) Suppose C = and Assumvtions [PHl \D.2[ \P.1\ and \P.2\ hold. Then, for every constant 
X > satisfying mixenPk{x) > X '^"■'^ ^'ninxep{x,9) > x, there are events Cn,k € ^6(0)" (S) V'' 
that have probability tending to 1 as min(n, fc) — )■ oo such that on Cn,k o,ny SMD-estimator 9n,k 
coincides with an SMD-estimator that is obtained from using V{t,x,D) instead of V{t, D) as 
the underlying auxiliary model. 

Proof, (a) By Proposition l4lT b) in Appendix |F1 Qn,k is continuous and real- valued on the 
compact set Q for each (xi, . . . , x„, ui, . . . , w^) G fi" x V'^ implying that any 9n,k is a minimizer for 
each (xi, . . . , a;„, wi, . . . , Vk). Since Qn,k is also a measurable function in (xi, . . . , x„, wi, . . . , Vk) 
for each fixed G ©, as shown earlier, the existence of a measurable selection follows from 
Lemma A3 in Potscher and Prucha (1997). 
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(b) By Reniark[5]Ji) there are events An € S(ri)" that have probabihty tending to 1 as n — oo 
on which mix^QPn{x) > mix^nPA{x) > 0. From Proposition HTT b') it follows that Qn.k is 
continuous and real- valued on Q for each {xi, . . . ,Xn,vi, . . . ,Vk) G A„ x V''. Compactness of 8 
completes the proof. 

(c) Let X be as in the proposition. Set Cn.k = x Bk, where An and Bk are as in 
Remarks ini[i) and (iii), and observe that Cn.k has probability tending to 1 as min(ri, fc) — > oo. By 
Remark ini we have on Cn,k that infxeoPTi(a;) > x and infoxe Pfc(^')(a;) > X- Since V{t,XyD) C 
'P{tX,D), it follows that on Cn,k the NPML-estimators Pn and Pk{d), respectively, coincide 
with the corresponding NPML-estimators based on the auxiliary model 'P{t,X:D) instead of 
'P{tX,D). Therefore, on Cn,k, the objective function Qn,k coincides with the corresponding 
objective function based on the auxiliary model V{t,x,D), and thus 9n,k coincides with the 
corresponding SMD-estimator based on the auxiliary model V{t,Xi D). ■ 

The proofs of Parts (a) and (b) of the subsequent proposition are analogous to the proofs of 
Proposition [TH] above. Part (c) follows immediately from compactness of Q and Lemma [3D] in 
Appendix |F1 

Proposition 19 Suppose Vq C £^(£7) and 6 i~> pg is a continuous map from Q into (£^(11), || • 
II2). 

(a) Suppose C > holds. Then any MD-estimator 0„ minimizes Q„ for every . . . , Xn) S 
57". Furthermore, there exists an MD-estimator On that is B{^1)^ -B{Q)- measurable. 

(b) Suppose C = and Assumvtions \DJ\ and \D.^ hold. Then there are events An G B^fl)" 
that have probability tending to I as n ^ 00 such that, on these events, any MD-estimator On 
minimizes Q„. [In fact, more is true: If x > satisfies inixenPAix) > X> then, on An, any 
MD-estimator On coincides with an MD-estimator that is obtained by using V{t, x, D) instead of 
'P{t, C, D) as the underlying auxiliary model.] 

(c) Suppose Assumvtion lD.B is satisfied. Then Q attains its minimum on Q. 

Remark 20 Assumption IP.4I together with a uniform integrability condition on {pg : e 0} 
clearly implies that Ve C £^(57) and that 1-^ pg is a. continuous mapping from 9 into (£^(57), || • 
1 1 2). In particular. Assumptions IP.l l and IP.4I together are sufficient. 

Proposition 21 (a) Let Assumptions {DTJI \D.SX IP.il \P.SX and \R.1\ be satisfied. If Q has a 
unique minimizer Oq over Q, then any SMD-estimator On.k converges to Oq in outer probability 
as min(n, k) ^ 00. 

(b) Suppose Vq ^ £^(17) and ^ pe is a continuous map from Q into (£^(f7),|| • II2). 
Let Assumvtions \D71\ and \D.^ be satisfied. If Q has a unique minimizer Oq over Q, then any 
MD-estimator On converges to Oq in outer probability as n 00. 

Proof, (a) Note that Q is continuous (by Remark [20l Proposition [29] in Appendix [21 and 
Proposition WT\c) in Appendix IF]) . and that Q{0) > Q{Oq) for any ^ Oq hy assumption. 
Furthermore, Qn,ki(^) converges to Q{0) uniformly over the compact set Q in outer probability 
as min(n, fc) — >■ cx) by Proposition I42r b) in Appendix [F] A standard argument together with 
Proposition [T5I gives the result. For more details see Gach and Potscher (2010). 
(b) Analogous. ■ 

Remark 22 (i) It follows from Proposition[25]in Appendix [51 together with Remark [20] that the 
assumptions of Proposition I19f c) are satisfied under the assumptions of Part (a) of the above 
proposition (and they are trivially satisfied under the assumptions of Part (b)). Consequently, 
under the assumptions of the above proposition, Q always has a minimizer over Q. Hence, the 
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assumption in the above proposition that Q has a unique minimizer is in fact only a uniqueness 
assumption. 

(ii) We do not strive for utmost generahty in the consistency result for MD-estimators; pos- 
sible relaxations lie in weakening the assumptions that Vq C £^(ri) and that 0q is unique. 



5.2 Asymptotic Normality of SMD-Estimators 

We next show that SMD- and MD-estimators are asymptotically normally distributed, with 
their asymptotic variance-covariance matrix coinciding with the inverse of the Fisher-information 
matrix in case the parametric model Vq is correctly specified. We first prove the result for MD- 
estimators and then show how this can be carried over to SMD-estimators. To this end we 
introduce a further assumption which is standard in maximum likelihood theory. 

Assumption P.5 The interior Q° o/ O C M™ is non-empty. For every x G f2 the function 
9 I— >■ p{x,9) is twice continuously partially differ entiahle on Q° , and the following domination 
conditions hold for all i, j = 1, . . . , m: 



sup 

n ee0° 



dX{x) < oo, 



sup 

n eee° 



d'^p 



ix,0) 



dX{x) < oo. 



We note that under the assumptions of the subsequent theorem, as well as under the assump- 
tions of Theorem [551 the function Q always possesses a minimizer (cf. Proposition \Wlc) and 
Remark [501 as well as Proposition [5S1 in Appendix [^ in case of Theorem [55]) ; furthermore, the 
Hessian matrix of Q{6) exists for every 9 £ Q°, cf. Lemma in Appendix IF] which provides an 
explicit formula. We shall write J{9) for 1/2 times the Hessian matrix of Q{0). 



Theorem 23 Let Assumptions WT^ \P.U [P. 21 \P.4\ \P-5\ be satisfied. Suppose that the minimizer 
^0 '^fQ "''^6'^ ® '■^ unique and belongs to &° , and suppose that the matrix J{Oq) is positive definite. 
Furthermore, assume that the first-order partial derivatives ■^{■,6q) belong to W|(i7) for some 
s > 1/2 and for all i = 1, . . . ,m. Then 

V^(9„ - e*) ^ A(0, J{0oy'liOo)J{Ooy') as n ^ oo, 
where I{Oq) is given by 

which is well-defined and nonnegative definite. If additionally, Vq is correctly specified in the 
sense that p^ = pg„ a.e. for some 9q £ 9, then 9q = 9q and I(9q) — J{9q) hold, and I{9q) 
coincides with the Fisher-information matrix. 

Proof. Step 1: Assume first that C > 0. By Proposition [5TTb). 0„ belongs to a sufficiently 
small open ball, centered at 9'^ and contained in 8°, on subsets En of the sample space that have 
inner probability tending to 1 as n — >■ cx). Consequently, 



d9 







holds on En- Applying the mean- value theorem to each component of dQn/dd then yields on 

En 



"^(^S) + J{0o)V^{9n - 9*) + - J{9*))V^{9n - 9*) = 0, 



(28) 



26 



where is the Hessian matrix of Q„ with i-th row evaluated at some mean value 9n,i on the 
line segment that joins 9q and Observe that iJ„ converges to the invertible matrix J{Oq) in 
outer probability by Proposition ETl Proposition's] in Appendix iFl and continuity of J{9) on Q° 
(cf. Lemma l44l in Appendix IF| . We next show that the score evaluated at 9q satisfies a central 
limit theorem. To this end let v G R™ be arbitrary, and use Lemma HW a) to obtain 

Jn oa p^ 

-2^1 ip^-p,,)v'^i;9*o)-d\ 
Jn ou p^ 

= I + II + III. 

Observe that Expression /// equals ^/nv' {dQ / d9){9'^) by Lemma |33fb) in Appendix iFl Since 6'g 
is an interior minimizer of Q by assumption, Expression /// is 0. 

Convergence of I: By assumption v'^{-,9q) belongs to W|(ri) with s > 1/2 and is thus 
sup-norm bounded by Cs\\v' ^{■,6q)\\s^2 < oo. Clearly, Upe^P^^Pl^Hn ^ Q~^CtD holds in view 
of Assumption IP.ll Hence, 

C^CtD^\\p.^-p^\\l 

s,2 

Consequently, Expression / converges to in outer probability by Proposition llOr a') applied with 
s = 0. 

Convergence of II: Set r = min(s,i) > 1/2. Observe that -2v'{dp/d9){-,9l) G W^(17) by 
assumption, that pg-^ G W2(r2) by Assumption IP.ll and that p^ G W2(ri) by Assumption ID. II 
Since C > has been assumed, it follows that 

belongs to W2(fi) in view of Proposition[TJa),(d). Applying Theorem [TST a') with F = {/} we ob- 
tain that II converges in distribution to a centered normal distribution with variance Av' I{9q)v. 
By the Cramer- Wold device, y^{dQn/d9){6Q) asymptotically follows a centered normal dis- 
tribution with variance- covariance matrix 4/(^?q). Nonncgative definiteness of I{9q) is now an 
immediate consequence and the asymptotic distribution of ^/n{9n — 9q) follows easily from (|28p. 
The claims under correct specification of the model Vq follow easily from Lemma H^ b) in Ap- 
pendix [f] 

Step 2: Now assume that C = 0. Note that mixenPA{x) > and 'mifixep{x,6) > because 
of Assumptions [D73] and |P!2l Let % > be such that mixenPkix) > X and inffixO pix,9) > x- 
Then it follows from Proposition [TWb') that there are events that have probability tending to 
1 such that on these events dn coincides with an MD-estimator 9n that is based on V{t,x,D) 
instead of C, Z?). Since the assumptions of the theorem are also satisfied with V{t,x,D) 
instead of 'P{t,(,D), applying to 9n what has already been established in Step 1 completes the 
proof. ■ 

The following lemma will be instrumental in proving the asymptotic normality result for 
SMD-estimators. 



I < 2Cs 



,dp, 
vg-9^;0o) 
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Lemma 24 Let U C M™ be a (non-empty) open, convex set. Let f : U M. and g : U ^ M. be 
functions such that g is twice partially differentiable on U with Hessian satisfying 

for all y G M™ and some < K < oo. If u is a minimizer of f over U and v is a minimizer of 
g over U , then 

\\u~v\\<2K-''-'^\\f -g\\u. 

Proof. Suppose that minimizers u and v exist, since otherwise there is nothing to prove. As v 
is a minimizer of the twice partiahy differentiable function g on the convex open set U , we have 
(by a pathwise Taylor series expansion) that 

1 d'^g 
g{u) = g{v) + -{u - ^y-Q^i'")i^ - 

where v lies in the convex hull of {u, w} C U . By we obtain 

\\u - v\\ < V2K-'^'^\giu)~giv)\. (30) 

Next, note the inequality 

f{u)-g{u)<f{u)-g{v)<f{v)-g{v) 

which implies 

\fiu)-giv)\<\\f-g\\u, 

which in turn yields 

|5(^^) - giv)\ < \g{u) ~ fiu)\ + \f{u) - g{v)\ < 211/ - g\\u. 
Plugged into this proves the result. ■ 

The asymptotic normality result for SMD-estimators is now as follows. 

Theorem 25 Let Assumptions [DTM \P.1[ IP. 51 be satisfied. Suppose that the minimizer 0q 
of Q over O is unique and belongs to 0° , suppose that the matrix J{0o) positive definite, and 
assume that the first-order partial derivatives ■§§-{■, Oq) belong to W|(ri) for some s > 1/2 and 



for all i ~ 1, . . . , m. Suppose further that either (i) Assumvtion \P . 2\ is satisfied and k(n) satisfies 
k{n) /n^^^^*' oo as n oo; or (ii) Assumvtion \P.3\ is satisfied and k{n) satisfies k{n)/n'^ oo 
as n ^ oo. Then 

V^iOnMn) - ^o) - ^(0, J{d*„)-'l{d*)J{9*)-') as n -XX), 

where I{0q) is given as in Theorem \23[ is well-defined, and is nonnegative definite. If, addition- 
ally, Vq is correctly specified in the sense that p^ = pg^ a.e. for some Oq Q, then 0q = 9q and 
I{9o) — J{9o) hold, and I{9q) coincides with the Fisher-information matrix. 

Proof. Step 1: Assume that C > 0. Observe first that the assumptions of the current theorem 
imply the assumptions of Theorem 1231 noting that Assumption IP.4I follows from Assumptions 
IP.ll and lR.2l in view of Proposition [52] in Appendix]^ It hence suffices to prove that 

V^(^n,fe(n) - ^n) = Op,.(l) as n CX). (31) 
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We achieve this by applying Lemma [Ml to the objective functions Qn,k and Q„: Let U be 
a sufficiently small open, convex neighbourhood of 6q that is contained in 0° such that the 
smallest eigenvalue of J{9) is bounded from below by a positive constant for all 6 G U, the 
constant not depending on 0. Such a set U exists, since J{Oq) is positive definite by assumption 
and J{6) is continuous on 8° by Lemma in Appendix iFl Since for all i, j — 1, ... ,711 



sup 



89,39 ' d9,d9.j 



= op(l) as n — >■ 00 



by Proposition's] in Appendix |F1 it follows that there are events En having probability tending 
to 1 as n — cx) such that on En 

9& dOoO 

holds for some constant K > Q which does not depend on n or the data. By Propositions 
9n and 9n,k{n) belong to U on subsets E'„ of the sample space whose inner probability goes to 1 
as n — 7^ cx). For the rest of the proof of Step 1 we restrict our reasoning to the events EnCi E'^, 
and note that they have inner probability tending to 1 as n — >■ c». By Proposition I19f a) and 
Proposition llST a) the estimators 9n and 9n^k(n)i respectively, minimize the objective functions 
Q„ and Q„,fe(n)- Hence, we may apply Lemma[Mlwith / = Q„,fc(„)|t/, g = Q„|f/, u = 9„^k{n), 
and V — 9n to obtain 

WkMn) - kW < 2K-'/y||Q„,fe(„)-Q„|l[/. 

It follows from Proposition H^ c) in Appendix [Fl and the choice of k{n) that (pij) holds under (i) 
as well as under (ii). 

Step 2: Now assume that C = 0- Note that inf a;gsi Pa (a;) > and ini^xe p{x,9) > because 
of Assumptions ID.31 and lP!2l (jP.3[ respectively). Let x > be such that mix(zQp^{x) > x and 
inffixe ^) > X- Then it follows from Proposition [T9l b) and Proposition nsT c) that there 
are events Cn^k(n) having probability tending to 1 as n — 00 such that on these events 9n^k(n) 
coincides with a SMD-estimator 9n,k{n) that is based on VitjXjD) instead of VitXjD). Since 
the assumptions of the theorem are also satisfied with 7^(t, x, D) instead of (, D), applying 
to 9n^k{n) what has already been established in Step 1 completes the proof. ■ 

Remark 26 (i) The preceding theorem was proved by showing that 9n^k(n) and 0„ are sufficiently 
close (with Lemma [M] being instrumental here) and by applying Theorem 1231 The reason for 
going this route instead of directly applying a mean- value expansion to the score 9Q„,fe(„)/c^^ is 
that this would require knowledge about differentiability properties of the mapping 9 i— ^ Pk{n){^)j 
which we were unable to obtain. [The usual approach to establish such differentiability properties 
via the implicit function theorem is not feasible here since Pk{n) (^) falls on the boundary of 
7^(t, C, D) as shown in Proposition [S]] A consequence of the method of proof chosen is that we 
have to assume at least fc(n)/n^ ^00. It is likely, that if the more direct method of proof via 
expansion of the score dQn.k{n)/d9 can be made to work, this would deliver asymptotic normality 
under weaker conditions on k(n). 

(ii) Nickl and Potscher (2010) consider spline projection density estimators rather than 
NPML-estimators. Because of the simpler structure of these estimators, this allows them to 
also employ the alternative route via a mean-value expansion, leading to an asymptotic nor- 
mality result under weaker growth-conditions on k{n). We note that Nickl and Potscher (2010) 
consider only the correctly specified case. In this case and when fc(n)/n^ ^ 00 is assumed (as 
is in the present paper), the assumptions employed in Nickl and Potscher (2010) and in the 
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present paper are quite comparable, some differences being due to the different non-parametric 
estimators considered. 

(iii) The asymptotic normahty results given here are for a fixed underlying data-generating 
mechanism P. Under appropriate assumptions, corresponding results that are uniform in the 
underlying data-generating mechanism can be obtained, see Chapter 7 in Gach (2010). 

A Appendix: Proofs for Sections [2] and [3] 

Proof of Proposition [2j (a) The implications (i) in (ii) and (ii) in (iii) are obvious. If p is 
an element of V{t,C,,D), we have 1 = J^pdX > J^(dX = C'^(^) showing that C < A(il)^^. 
Furthermore, the Cauchy-Schwarz inequality implies 1 — \\p\\i < ||p||2l|l||2 5; Il?'llt2pll2 — 
DXiny/"^, which implies X{n)-^ < D'^ . Thus (iii) implies (i). 

(b) Suppose (i) holds. Then A(rj)^i e V{t, C, D) by Part (a). Suppose peV{t, C, D). If now 
( = X{n)-^, thenp-A(rj)-i > 0. But clearly {p- X{n)-^) dX = 0, implying that p = X{n)-^ 
A-a.e., and hence everywhere by continuity of p. If A(f2)~-'^ = D^, then = ||p||2 ||1||2 follows 
from the calculations in the proof of Part (a). But this shows that p is A-a.e., and hence 
everywhere by continuity of p, proportional to the constant function 1, the proportionality factor 
necessarily being A(i7)^^. This proves that (i) implies (ii). That (ii) implies (iii) is trivial. Since 
the constant density A(r2)^^ belongs to 'P{tX,D) by Part (a), (iii) is equivalent to (ii). To show 
that (ii) implies (i), assume that ( < A(r2)~^ < D^. Choose e > small enough such that 
C < A(r2)~^ — £ holds. Then define / to be the restriction to of the affine function that has the 
value A(r2)^^ — e at the left endpoint of il and A(il)^^ +e at the right endpoint. By construction 
/ S W2(ri), integrates to 1, satisfies infji/ > C: a-nd ||/||t,2 < D provided e is small enough. 
That is, / is a further element of 'P{tX,D), contradicting (ii). 

(c) Note that V{t,C,TD) is non-empty by Part (a). Since the defining conditions are convex, 
it is convex. That V{t, (, D) is compact as claimed follows from Lemma 3 in Nickl (2007). [Note 
that the proof of this lemma does not use that C > 0, as is implicit there, and therefore is also 
vahd for C = 0.] ■ 

Proof of Proposition [3) Since (a) is a special case of (b) it suffices to prove the latter: 
Suppose V' satisfies (i) and (ii), and choose 6 > Q small enough such that 5 < D — suppg^, lbl|t,2 
and CtS < mixen,peV' p{x) — C hold, where Ct is the constant appearing in Proposition [TJ For 
every p e V and / G W*(r!) with ||/||t,2 < S we then have |b + /|k2 < Iblk2 + ||/lk2 < 
suppgp, \\p\\t.2 + S < D and info(p + /) > infop - supf^ / > inf^gn.pfEP' p{x) - Ct6 > C (for the 
latter using Proposition [T|). This shows that Ut,s{p) H Ht is a subset of Vit, C,, D) for every p £ V' . 
Conversely, suppose V' is uniformly interior to VitXjD) relative to Hj. We first establish (i): 
Let (5 > be the radius figuring in the definition of being uniformly interior and let p V' he 
arbitrary. Choose a g G Ht different from p and define / — 6{q — p)/{2\\q — p||t.2)- [Note that q 
and hence / may depend on p.] Then f 0, ||/||t,2 = < 6, and fdX = hold. Observe 
that p + / and p— / then both belong to Ut,s{p) n Hj and hence to V{t, D), since Ut,s{p) H Hj C 
V{t, (, D) by assumption; in particular \\p + /||f,2 < D and \\p — f\\t.2 < D is satisfied. Since the 
Sobolev-norm originates from an inner product, we have |b+/|lt_2+lb~/llt,2 — 2 [lbll?.2 + ll/ll?.2] 
and thus Ibllt 2 — ~ ^^/4. Since this is true for every p € V' we obtain (i). We finally prove 
(ii): Let a;„ G and p„ G V' satisfy p„(a;„) — inix£n,pe'P' p{^)- The sequence Xn has a cluster 
point xq in the closure Cl of the interval fl. There exists a sufficiently small neighborhood A of 
xq in n and a C°° function h satisfying h{x) = —1 for all a; G An (which is non-empty) as 
well as hdX — 0. Furthermore, h can be chosen to be bounded with all its derivatives having 
compact support contained in fi; consequently, h G W2(r2). Since V' is uniformly interior to 
V{t, (, D) relative to Ht by assumption, it follows that Pn + ah £ V{t, (, D) for sufficiently small 
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a > 0, where a can be chosen independently of n. Consequently, inf^ {pn + o:h) > C must hold. 
But this implies Pnix„) > infop„ = inf^nnPn = infAnn {Pn - a) + a = inf^nn {Pn + ah) + a > 
info {Pn + cth) + a > (^ + a, which in turn implies inix^n^p^v' p{x) > C,+a > C. Finally, we prove 
Part (c): Note that X{n)-^ € V{t,C-,D) by Proposition H It is interior to V{tX,D) relative to 
Ht by Part (a) of the current proposition and the assumption ( < A(f2)^^ < D^. The second 
claim then follows from Theorem V.2.1. in Dunford and Schwartz (1966). ■ 

Proposition 27 Let pn,p ^ V{t,(^, D). Then the following statements are equivalent: (i) ||pn~ 
pWq, converges toO; (ii) Pn converges pointwise top; (Hi) Pn converges top a.e.; (iv) Pn converges 
to p on a dense subset of f2; (v) ||p„ — p\\r.2 converges to for some r satisfying < r < t; (vi) 
\\Pn — p\\r,2 converges to for all r satisfying <r <t. 

Proof. To show that (v) implies (vi), it suffices, in light of Part (c) of Proposition [TJ to show 
that \\pn ~ p\\s.2 converges to for arbitrary s > r satisfying 1/2 < s < t. Since V{t,(^,D) 
is a compact subset of W|(r2) in view of Proposition [21 for any subsequence p„/ of Pn there 
exists a further subsequence Pn" of Pn' and a p* € V{t, D) such that \\pn" — P*\\s,2 converges 
to 0. By Part (c) of Proposition [1] we then have that also \\pn" — P*\\r,2 converges to since 
s > r. Because also \\pn" — p\\r,2 converges to as a consequence of (v) and keeping in mind 
that p and p* are continuous, it follows that p* = p. This shows that ||p„ — p|js,2 converges 
to 0. Furthermore, (i) implies (ii), (ii) implies (iii), and (iii) implies (iv). That (vi) implies (i) 
is a direct consequence of Part (b) of Proposition [T] It remains to show that (iv) implies (v) . 
Choose r such that 1/2 < r < t. The same compactness argument as above shows that for 
any subsequence pn' of p„ there exists a further subsequence Pn" of Pn' and a p* G V{t,C,,D) 
such that \\pn" — P*\\r,2 converges to 0. By Part (b) of Proposition [U we have that \\pn" — P*\\n 
converges to 0. Consequently, p and p* coincide on a dense subset of $7. Since p and p* are 
continuous, they are identical. This shows that \\pn" ^ p\\r,2 converges to 0, and hence the same 
is true for the entire sequence Pn ■ ■ 

Remark 28 We note that 'P{t, C, D) can equivalently be written as 

\pe\Nl{n): [ pdx^i, infp(x)>c, \\p - x-\n)\\t 2 < - x-\n)\ 

because p — X^^{fl) and 1 are orthogonal in WKSl). As a consequence, V(t, 0, D) = V{t, C, D) at 

least for all < C < A"^(f7) - Ct {D^ - A"^(^7))^^^ since p € V{t,0,D) implies M^^npix) > ( 
for such C by Proposition [T](b). 

Assumptions on the density functions in the class Vq and on the simulation mechanism p are 
of course related to each other, but the interrelationship is somewhat intricate. The following 
proposition collects two important observations. 

Proposition 29 If Assumvtion [P7l\ is satisfied, then Assumvtion \R. l] imvlies Assumption \P.4\ 
However, in general Assumvtion lR. l] does not imply Assumption \P.4\ 

Proof. The first claim is proved as follows: Let F{z,d) — js^^^n- x<z}P^ distribution 
function on 51 that is associated with pg. Let (?„,0 e O be such that On converges to 9. Now 
Assumption IR.ll implies that p{-,9n) converges to p{-,9) in distribution under /z. Noting that 
F{-,9) and F{-,9n) are the distribution functions of p{-,9) and p{-,9), respectively, as well as 
noting that F{-,9) is continuous in its first argument, it follows that F{z, 9n) converges to F{z, 9) 
for every z € VI. By Assumption IP. II and sup-norm compactness of P(t, C, D) it follows that 
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every subsequence pg^, of pe^ has a further subsequence pg^„ that converges to an element p* e 
V{t, D) in the sup-norm. But this clearly implies that F{z, 9n") converges to j,^^} P* '^^ 

for every z G il. It follows that p* — pg a.e., hence everywhere on by continuity of po and p* . 
This proves the first claim. For a proof of the second claim see Proposition 5 in Gach (2010). ■ 

B Appendix: Properties of the Non-Parametric Likelihood 
Function 

Proposition 30 

(a) For every non-negative B (CI) -measurable real-valued function f the map (xi, . . . , a;„) h- !■ 
Ln{f] xi, . . . , Xn) is B{n)"-B{[—(X}, oo)) -measurable, and the map (wi, . . . , v^) H> Lk{9, /; wi, . . . , Vk) 
is V'^-,B([— oo, oo)) -measurable for every G O. 

(b) Let !F be a set of non-negative bounded real-valued functions on U,. 

(bl ) Then, for every [xi, . . . , x„) G 17", / i-> Ln{f; Xi, . . . , x„) is a continuous map from 
{T, 11 • to [—00,00). The same is true for the map f i-> Lk{9, f;vi, . . . ,Vk) for every 6* G O 
and every {vi, . . . ,Vk) G V'^ . 

(b2) If the elements f £ J- are additionally also continuous and Assumption \R.1\ is 
satisfied, then, for every {vi,...,Vk) G V'^ , {0,f) i— > Lk{0, f;vi, . . . ,Vk) is a continuous map 
from 6 X (J", |j • ||o) to [-00, 00). 

(c) Let J- be a set of non-negative bounded B{Q) -measurable real-valued functions on il that 
are uniformly bounded away from 0. 

(cl) Then L{f) is a continuous real-valued function on {J- , \\ ■ \\n). The same is true for 
L{9, f) for every given 6 Cz Q. 

(c2) If the elements f G J-^ are additionally also continuous and Assumvtion \R.l\ is 
satisfied, then L{6, f) is a continuous real-valued function on Q x {!F, \\ ■ ||n). 

(d) Let T be a sup-norm compact set of non-negative bounded B(fl) -measurable real-valued 
functions on that are uniformly bounded away from 0. 

(dl) Then 

lim sup|L„(/)-L(/)| -0 F-a.s., 

and, for every 9 G &, 

lim sup|Lfe(0,/)-L(0,/)| =0 ^i-a.s. 

(d2) If the elements f G J- are additionally also continuous and Assumvtion \R.l\ is 
satisfied, then 

lim sup \Lki9, f) - L{9, f)\ = fi-a.s. 
(In Part (d) we use the convention that the supremum is Q if J- is empty.) 

Proof, (a) The first claim is clear as / is S(il)-S([0, oo))-measurable by hypothesis and the 
extended logarithm is S([0, oo))-;B([— 00, oo))-measurable. For the second claim additionally use 
that /5 : F X — >■ is V-,B(il)-measurable in the first argument for every 6* G O. 

(b) To prove the first claim in Part (bl), fix (xi, . . . ,x„) G 51". Let //,/ G be such that 
II// ~ f\\n converges to 0. Since setting logO = —00 continuously extends the logarithm to the 
interval [0,oo), log/;(a:i) then converges to log f{xi) for every i, thus establishing the first claim. 
The second claim in Part (bl) is proved analogously. To prove Part (b2), fix (ui, . . . ,Vk) & 
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and let 9i,9 £ 6 and fi,f £ be such that \\9i — 9\\ and \\fi — converge to 0. Use the 
triangle inequality to obtain for every i 

\fi{p{v,,9i)) - f(Av^,9))\ < \fi{p{v,,9i)) - f{p{v,,9i))\ + \f{p{v,,9i)) - f{p{v,,9))\ 

< \\fl-f\\n + \f{piv^,9l))~fipiv,,9))\. (32) 

The first expression on the r.h.s. of ([5^ converges to by hypothesis. Making use of Assumption 
IR.ll and the continuity of /, the second one converges to as well. Continuity of the extended 
logarithm on [0,oo) delivers Part (b2). 

(c) To prove the first claim in Part (cl), denote by ^ > the lower uniform bound of all 
elements in Let fi, f £ J- he such that ||/; — / converges to 0. Then {fi : I £ N} is bounded 
by some B, < B < oo. Since the logarithm is bounded on B], the domination condition 

/ sup I log/; (a;) I dF{x) < oo 

is satisfied. By the already established Part (bl) (with n = 1), log//(a;) converges to log /(a:) 
for every x £ Q. The first claim then follows from the theorem of dominated convergence. The 
second claim in Part (cl) is proved in exactly the same manner. To prove Part (c2), let 9i,9 £ Q 
and fi,f £ T be such that \\9i — 9\\ and ||/; — f\\n converge to 0. By the same argument as 
before, the domination condition 

/ sup sup\logfi{piv,9))\dp.iv) < OG 
Jv sgo leN 

is satisfied. By the already established Part (b2) (with k = 1), log fi{p{v,9i)) converges to 
log f{p{v, 9)) for every v £ V. Part (c2) then follows from the theorem of dominated convergence. 

(d) To prove the first claim in Part (dl), we use Mourier's strong law of large numbers as 
given in Corollary 7.10 of Ledoux and Talagrand (1991) with the separable Banach space {B, |j • ||) 
given by (C(J", || • ||o), |1 • \\jr) and the mapping X given by X{f) = log/(Xi) - log/dP for 
f £ T. Note that X has values in C{J-, \\ ■ ||o) by using the already established Parts (bl) and 
(cl) in conjunction with the assumed sup-norm compactness of J-. Clearly, X{f) is a random 
variable for every f £ J^, and hence X is measurable with respect to the a-field on C{T, \\ ■ ||o) 
that is generated by the point-evaluations. Since this cr-field coincides with the Borel tr-field on 
C(^, II • ||o) (see, e.g.. Problem 1 in Section 1.7 in van der Vaart and Wellner (1996) and observe 
that (J^, II ■ ||n) is a compact metric space), X is a Borel random mapping. The integrability 
condition E ||^|1 < oo follows from 

/ sup I log/(a;)|dP(a:) < c», 
Jn feT 

which is true since the elements of T are uniformly bounded and uniformly bounded away from 
by hypothesis. The second claim in Part (dl) is proved completely analogously. Part (d2) is 
proved in a similar manner: Apply Corollary 7.10 in Ledoux and Talagrand (1991) with B the 
separable Banach space of all bounded, continuous functions on x (J^, || • ||o) equipped with the 
sup-norm || • ||ex and with X given by X{9, f) — log f{p{Vi^ 9)) — jy log f{p{-, 9))dii. Note that 
by the already established Parts (b2) and (c2) in conjunction with compactness of x (J^, || • ||si), 
X takes its values in the space of (bounded) continuous functions on x (J", || • \\q). Again X 
is a Borel random mapping. The integrability condition E ||^|| < oo now follows from 

/ sup sup \ log f{p{v,9))\d^i{v) < oo, 
Jv fee feJ" 
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which is true since the elements of T are uniformly bounded and uniformly bounded away from 
by hypothesis. ■ 

Proof of Lemma [7) (a) It is sufficient to show that 

\ (log(p^))"dP= / (log(pO)"PAdA<cx). 

in J{xeSl:pi(2;)>0} 

By Assumption |D] this is equivalent to showing that 

/ h{pt,)dX < oo, (33) 

J{xef2:0<Pi(x)<l} 

where h{y) is defined by h{y) — —ylogy for every y £ (0, 1]. Since h : (0, 1] — [0, cx)) can be 
continuously extended to [0, 1] by setting h{0) — 0, it is bounded on the compact interval [0, 1], 
and a fortiori on (0, 1]. But this establishes l\'3'3\i since A(0) < oo and thus completes the proof 
for L. The proof for L{9, •) is analogous upon observing that 

/ ilogpeipi■,0))yd^i^ f (logipe))- pedX (34) 

Jv J{xen:p{x,e)>o} 

by the change of variable theorem. 

(b) For any p £ V{t, (, D) different from p^, the set {x £^l : p{x) ^ p^{x) > 0} has positive 
P-probability since p and p^ are continuous functions on fl. In view of the already established 
Part (a) the expression L{p) — L{p^) is well-defined, and the strict Jensen inequality gives 

L{p) - L{p^) = [ log —dP < log [ —dF < 0. 

J {xen:p^(x)>o} Pa J {xen-.p^ixjx)} Pa. 

(c) Follows similarly to Part (b) in view of the representation 

L{0,pe) = / \og{p0)pedX. 
J {xen:p{x,e)>o} 

m 

Part (a) of the following proposition is essentially given in Proposition 3 in Nickl (2007). [We 
note that the set V defined there is not sup-norm open as implicitly claimed, the apparently 
intended definition in the notation of Nickl (2007) being V = {d e L°°{n) : Mx(znd{x) > C/2}. 
Inspection of the proof shows that this proposition remains correct for ^ = 0.] The proof for 
Part (b) is completely analogous. 

Proposition 31 Define U = {f E L°°{n) : inixen f{x) > 0}. Let a be a positive integer, f £U, 
andh,...,U G L°°(r!). 

(a) The a-th Frechet derivatives of Ln : — > M and L : U ^ M. are given by 

D«L„(/)(/i, ...,/„) = {-ir-\a - l)!P„(/-"/i • • • /„), 
D"i(/)(/i, ...,/„) = i-ir-Ha - l)!P(/-"/i • • • /„). 

(b) The a-th partial Frechet derivatives of Lk : Q x U ^ M. and L : O x U ^ R with respect 
to the second variable are, for 9 Cz 0, given by 

B'^LkiO, /)(/i, ...,/„) = i-ir-\a - l)!/i,.(/-"(p(., 0))/i(p(., 0)) • • • /„(/,(., e))), 

D"L(0, /)(/i, ...,/„) = (-l)"-i(a - l)!/i(r"(p(., e))f,{pi;9)) ■ ■ ■ fM,9))) 

= (-l)"-i(a-l)! / r"/i---/„MA. 
Jn 
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The next result is a uniform version of Lemma 2 in Nickl (2007). It provides rates of con- 
vergence for all derivatives of the auxiliary log-likelihood function that hold uniformly in 9 and 
P- 

Proposition 32 Let a he a positive integer, and let "Hi, . . . be hounded subsets of some 
Sobolev space W2(fi) of order s > 1/2. If Assumvtion lR^ and C > are satisfied, then 

sup ||D"ifc(0,p)-D"i(0,p)|l„^...^„ =0^(fc-i/2) ask^^. (35) 

0xV{tX,D) 

Proof. Note that 

sup \\T)"Lk{e,p) - D«L(0,p)||„^,...,„^ = (a - 1)! 11^., - ^Ik* 

0xV(tX,D) 

by Proposition EH where H* = {h{p{-, 9)) : h e 71,9 e Q} and 

n = {p^^'hi ■ ...-ha-, pe r{t, C, D), hi(^Hi,...,KeHc] ■ 

Since C > 0; the class T-Lisa bounded subset of the Sobolev-space with r = mm{t, s) > 1/2 

by Proposition [1] Measurability of the supremum on the l.h.s. of (1351) now follows immediately 
from Proposition [571 in Appendix [D] The class H* is /i-Donsker by an application of Proposi- 
tion [TTJa), hence — is bounded in probability at rate by Prohorov's theorem. 



The following lemma is a special case of Berge's (1963) maximum theorem. 

Lemma 33 Let X be a metrizahle space and Y a compact metrizahle space. Let u : X x y — > 
[— cxijCXi) be a continuous function that has a unique maximizer, say v{x), on the fiber {{x,y) : 
y e y} for every x & X . Then the mapping v : X ^ Y is continuous. 



C Appendix: Proofs for Section 14.2 



The following lemma is a consequence of Birman and Solomyak (1967), cf. Lorentz, v.Golitschek, 
and Makovoz (1996), p. 506. It can also be obtained from Theorem 1 in Nickl and Potscher (2007) 
via a retraction argument; see Gach (2010). 

Lemma 34 Let T be a bounded subset of the Sobolev space W|(r2) of order s > 1/2. Then the 
sup-norm metric entropy of T satisfies 

H{e,F,\Nl{n),\\-\\n)<e-^''. 



Proof of Proposition [TTJ (a) Choose a real number r < s satisfying 1/2 < r < 3/2 and 

2r — 1 < a, where a is as in Assumption IR.2I Then J- can also be viewed as a bounded subset 
of W5(f7), and hence of ^-^/^(f^), in view of Proposition [IJb),(c). We use this to obtain 

sup \f{p{v, 9')) - f{p{v, 9))\ < LMv, e') - p{v, 9)r'/^ < L, [R{v)\\9' - 9rY-"^ 

for some finite constant > and all v ^ V , all 9, 9' £ Q, where we have made use of 
Assumption IR. 21 A cover of J- * is obtained from suitable covers of 9 and as follows: Fix e > 
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and set 6{e) = {e/Lrf/", where v := 7(r - 1/2). To cover 0, note that it is contained in an m- 
cube of edge length I and thus in the union of at most \ly/Tn/5{e)'\ "'-many closed Euclidean balls 
B{9i,S{e)) with centers 6*^ e and radius 6(e), where \x] denotes the smallest integer not less 
than X. To cover J^, we take N{e,T, W|(f2), || • |jn)-many sup-norm closed balls [fj — 2e, fj + 2e\ 
of radius 2e whose centers fj already belong to T. [Note that this can always be achieved.] We 
claim that the brackets 

[fM■^0^)) - R^^'i-)^ - 2e, e.)) + R^l''{-)e + 2e] (36) 

with i = 1,..., [?V^/(5(e)]'" and j = 1, . . . , iV(£, J", W|(f7), || • H^) provide a cover of J"*. To see 
this, let h £ F* , that is, h = f{p{-,d)) for some 9 G Q and f G implying that there are indices 
i,j such that 6 G B{9i,5{e)) and / e [fj — 2e, fj + 2e]. Consequently, 

he[fj{p{;9))-2ejj{p{;9)) + 2e]. 

Now, 

h{v) < fj{p{v,9)) + 2e < fj{p{v,9,)) + \fj{p{v,9)) - fj{p{v,9,))\ + 2e 

< f,ip{v,9.,)) + R-'/^v)e + 2e 

for a\\ V G V, where the last inequality follows from the first display in the proof and the choice 
of (5(e). Similarly, 

f,ipiv,9,)) ~ R''/''{v)e - 2e < hiv). 

By construction of r, we have that Jy (i?''/'')^ dp < oo, and hence the £^ (/i)-bracketing size of 
any of the brackets in p6p can be bounded by e times a positive constant c that only depends on 
R, r, and p. Using the elementary inequality \x]"^ < max(l, (2x)™) this leads to the relationship 

2/^^x1/") e-™/'')iV(e,J-,W^(r!),|| -llo). 

Apply Lemma [M] to get 

iI[](e,^M| • ||2,p) <max(0,l-log£) + e-i/^ <£-i/^ 

which proves (fTS)) . The claim that J-* is /i-Donskcr now follows from Ossiander's central limit 
theorem (see Theorem 7.2.1 in Dudley, 1999) since clearly T* C £'^{V,V,p) holds. 

(b) For any fixed e > 0, we take for the cover given in (p6|) . Since the elements of T are 
bounded below by x > 0, the sets 

"logmax(x, /,(p(-, 9,)) ~ K^l\-)e - 2e), log(/,(p(., 9,)) + R'^/''{-)e + 2e) , 

for i = 1, . . . , \l^jTn/5{e)'\™' , j = 1, . . . , iV(e, J^, W|(ri), || • are non-empty brackets and cover 
logJ^*. Since the logarithm is Lipschitz on [x,oo) with Lipschitz constant x~^i the C^{p)- 
bracketing size of these brackets can be bounded by times the £^ (/i)-bracketing size of the 
corresponding brackets given in (I36|) . Arguing now as in the proof of Part (a) completes the 
proof. ■ 

D Appendix: Measurability Issues 

Lemma 35 Suppose t > 1/2. Then the Borel a-fields 6(W|(f^), || • ||o), and B{W^{n), \\ ■ \\s,2) 
for < s < t all coincide. In particular, the norms || • ||f2 and || • ||s^2 for < s < t are 
B{\Nl{fl),\\ ■ Wi^) -measurable. 
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Proof. Since the || • ||n-topology on \Nl{Q) is coarser than the || • ||s,2-topology on \Nl{Q), which 
in turn is coarser than the || • ||t,2-topology on W|(f7) (cf. Proposition [T]), it suffices to show that 
6(W|(f7), II • ||t,2) C B{\Nl{n), 11 • lla). The former a-field is generated by the cohection of ah 
closed II • ||i^2-balls since (W|(ri), || • ||t^2) is separable. As shown in the proof of Lemma 3 in Nickl 
(2007), these balls are || • ||o-compact and hence belong to B{\Nl{Q), \\ ■ ||o). ■ 

Proposition 36 (a) The quantities \\pn- Pk\\n, \\'Pn-Pk\\s,2 for ^ < s <t, \\pk{d)-pe\\n, and 
\\Pk{S) ~ pe\\s.2 for < s <t are random variables. 

(h) Suppose Assumptions \P.1\ and \R.1\ are satisfied. Then supg^g \\pk{0) — Pe\\n and 
supggQ ||Pfc(^) "Pells, 2 for < s <t are random variables. 

Proof, (a) Follows immediately from Theorem [5] and Lemma [35l (b) By Assumption IR.ll 
and Proposition [2ni in Appendix \X\ the parameterization 9 i— ^ Pb{x) is continuous, and hence is 
continuous in the || • ||o- and || • ||s^2-norms (0 < s < i) in view of Assumption [PJl and Proposition 
[57] in Appendix 121 By Theorem [SUb) and again Proposition [27] 6* i~> Pk{d) —pe is then continuous 
in the same norms. Since Q is separable, (b) follows from Part (a). ■ 

Proposition 37 Suppose s > 1/2. 

(a) Then 

X„(X, f)^V^ (^ljn{-;Xu. ■ . , Xn)f{-)dX - P(/)j 

and 

n 

2}„(x,/)=n-i/2^(/(a:,)-P(/)) 
1=1 

are Borel measurable on £7" for every f € W|(J7), where x denotes [x\, . . . , a;„) €E fJ". Further- 
more, if T is a non-empty bounded subset o/W|(r2), then supy^jr \'5n{xTf)\ is Borel measurable 
on il", where 3n stands for any of Xn, 2)ri, and X„ — 2)„. 

(b) Then 

ilk{v,ej)^Vk [ {pk{B){-;vi,...,Vk)-pe{-))f{-)dX 
Jn 

and 

k 

^k{v, 0, f) = (/(/^(«- ^)) - ^)))) 

are Borel measurable on for every 6 G Q and every f €E W|(r2), where v denotes (wi, . . . ,Vk) G 
V'' . Furthermore, if Assumvtion lRH] is satisfied and J- is a non-empty bounded subset o/W|(ri), 
then supggQ sup^gjr |23fe(u, 0, /)| is Borel measurable on ; if, additionally, Assumvtion IP.il 
holds, then supggQ supygjr |2IJfe(w, 0, /)| is Borel measurable on , where Wk stands for any of 
ilfc and ilfc - QJfc . 

(c) Then 

k 

1k{v, e, f) = fc-i e)-v,,..., Vk)f{p{v,, 9)) 

is Borel measurable on for every 9 E Q and every f G W|(r2). Furthermore, if Assump- 
tion \R.1\ is satisfied, J- is a non-empty bounded subset of W2(51),, and C > holds, then 
supggQ sup^gjjr |Tfc(i), 6*, /)| is Borel measurable on V'^. 
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Proof, (a) Since (xi, . . . , Xn) i— >■ Pn(s ^^i, • ■ • , x„) is a measurable map from 57" into {V{t, C, Z?), || • 
\\n) by Theorem[Sl since the map p i-^ ^/n (J p/dA — P(/)) is || • ||n-continuous on V{t, C, £*) for 
every / e W|(f7), and since every / is clearly Borel measurable, we see that X„(x, /) as well as 
2}ra(i, /) are Borel measurable on fi" for every / G W|(ri). Furthermore, it is easy to see that 
Xn{x, f) and SZ)n(aJ, /), and thus also X„(x, /) — 2)n(^j /), are continuous on {T, \\ ■ ||n) for given 
X. Since (J^, || • ||n) is clearly separable, Borel measurability of the suprema in Part (a) follows. 

(b) The first claim is proved completely analogous, making also use of the fact that p is 
measurable in its first argument. The second claim is also proved analogously by showing that 
now ilk{v,d,f) and ^k{v,9,f) are continuous on the separable space (9 x J", ||-|| + || • for 
given v: for QJ/j use that i— >■ p{v,9) is continuous on Q by Assumption IR.ll and that is a 
sup-norm bounded set of continuous functions. For ilk use the fact that 9 Pk{9) as a mapping 
from into the space {V{t, D), \\ ■ ||n) is continuous by Theorem[51 and that the same is true 
for pe in view of Assumption IP. 1 [ Proposition [55] in Appendix |3 and Remark 

(c) Measurability of Tfc(-, 9, /) for S 8 and / G W|(il) follows from measurability of / and 
p{-,9) and Remark[6ji). Continuity of Tfc(?;,-,-) on the separable space (9 x J^, ||-|| + || • 
follows from continuity oi pk{9){-;vi, . . . ,Vk) and /(•), Assumption IR. 1[ and C > 0. ■ 



E Appendix: Uniform Rates of Convergence and Entropy 
Bounds for Empirical Processes 

The subsequent theorem is a uniform version of Theorem 3.2.5 in van der Vaart and Wellner 
(1996). 

Theorem 38 Let {A,A,P) be a probability space, S and T non-empty sets, and let d be a non- 
negative real-valued function on T x T. Consider a sequence of real-valued stochastic processes 
{Hk{<J, t) : a G S, T € T) defined on (A, A) and a function H : S xT ^ R with the property that 
for every a € S there exists a T(cr) S T such that for all t d T 

H{<j, r) - H{a, T{a)) < -Cd'^ir, T{a)) (37) 

holds, where C,a > are constants neither depending on a nor t . Suppose, for all 5 > Q, 

E*sup sup ^\{Hk-H){a,T)-{Hk-H){a,T(a))\<ip^{5) (38) 

creS TeT,d(T,T(a))<S 

is satisfied for real-valued functions ipj^ such that for some /? < a the functions 5 H> S~^ipf.{S) 
are all non-increasing in S. Assume further that, for every a € S, Tk{a) : A ^ T satisfies 

Hki<y,Tk{a))>Hk{a,T) forallreT, (39) 

and let be a sequence of positive reals such that 

sup!i^4^<°o- (40) 
km yk 

Then, for every a Cz S, t(o') is a maximizer of H{a, ■), and 

sup d{Tk{cr),T{a)) ^ Op{r^^) ask^oo. 
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Proof. We have to show that for every iV G N 



lim lim sup P* rfeSupd(ffe(CT),T(cr)) > 2^ ) = 0. 

-1 



For k,j e N, set Vk,j = {{<J,t) : 2^-^ < rkd{T,T{a)) < 2^}. Then 



rfcSupd(ffe(a),T(a)) > 2^ 
o-es 

implies that there is some (Jq € S such that r/cC?(f ^((To), t((To)) > 2^, which in turn gives 
(fo, Tk{cro)) G Vk.jo for some jo > N. Combine this with (l37t and ((39|) to get 

(i/fe - i7)(ao,ffc(ao)) - (i/fe - H){ao,T{ao)) > Cd"(ffe((To), r(ao)) > Cr^"2"^»-". 

This implies 

P* frfcSupd(ffe(a),T(a)) >2^ 

V creS 

< ^P*\ sup \/fc(i/fc-i7)(a,T)-\/fc(i/fe-i/)((7,r(a)) >CVA?r-"2"^-" ). 

Via Markov's inequality (for outer probability) and psp . the r.h.s. in the previous display can 
be bounded by 



where the first inequality follows from ipkif^S) < c^ipf.{5) for c > 1. Note that the upper bound is 
finite by (|in| and does not depend on k; since X]j>Af 2^^~°'^^ converges to as — J' oo as /3 < a 
holds, the proof is complete. ■ 

We next present an upper bound for E* \\^/n(Pn — P)\\t for sup- norm bounded classes of 
functions J-. This result is essentially well-known, see Lemma 3.4.2 in van der Vaart and Well- 
ner (1996), but we provide explicit constants. A proof, under the additional assumption that 
Yi, . . . ,Yn are the coordinate projections on a product space, can be found in Gach (2010); 
inspection of the proof reveals that this assumption is unnecessary. 

Theorem 39 Suppose (A, A, P) is a probability space, Yi, . . . , y„ are i.i.d. with law P , and P„ 
denotes the empirical measure associated with Yi, . . . ,Yn. Let J- be a non-empty class of A- 
measurable functions on A, which are bounded by B, < B < oo, in the sup-norm and by rj, 
< rj < oo, with respect to \\ ■ \\2.P- Then 



E*||V^(P„-P)lb< (1696 + 64V2)/[](?7,^,|| • ||2,p) 



^ + ■ I|2,P) 



F Appendix: Auxiliary Results for SMD-Estimation 

Lemma 40 Suppose Vq C £^(0) and 9 ^ pe is a continuous mapping from Q into (£^(f2), || 
II2). Let f : il ^ R be an integrable function satisfying inf^jgo f{x) > 0. Then 

H{0) / {f^peff-^dX 
Jn 

is a continuous real-valued function on Q. 
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Proof. Rewrite the integrand as / — 2pQ +Pg/f, and note that each term is integrable by the 
hypotheses. Hence, H is real- valued. For continuity, let 9i,9 £ Q be such that ||6'z — 0|| converges 
to 0. Letting c = iuixen fi^): 



Proposition 41 (a) Suppose Vs ^ C'^{fl) and 9 ^ pg is a continuous map from O into 
(£^(r2), II • II2)- Then, on the event where inixenPn{x) > 0, 



holds and Q„ is a continuous real-valued function on Q. [In particular, in case C > holds, the 
above event is the entire sample space $7"./ 

(b) Let Assumvtion lR. i\ be satisfied. Then, on the event where inf^gn?5„(a;) > 0, 



holds and <Qn,k 'is a continuous real-valued function on Q. [In particular, in case C > holds, 
the above event is the entire sample space £7" x .] 

(c) Suppose Vs ^ £^(i7) and 9 ^ pg is a continuous map from Q into (£^(17), || • II2). // 
Assumvtion lD.^ holds, then Q is a continuous real-valued function on Q. 

Proof. Parts (a) and (c) are immediate consequences of Lemma l40l We next prove Part (b): 
Since Pn and PkiO) belong to V{tX,D) by construction, these densities are sup-norm bounded 
by CtD. Hence, Qn.k is real- valued whenever mix^nPnix) > 0. Since the map 9 1— )• PkiO) is 
continuous by Theorem [5{b) , continuity of Qn,/c then follows from the theorem of dominated 
convergence. ■ 

Proposition 42 (a) Suppose Vq C C^(i}) and 9 ^ pg is a continuous map from Q into 
(£^(r2), II • II2). Let further Assumvtions \DJ\ and W.A be satisfied. Then 




< C ^\\pg^ - pg\\2{\\p0, ~ Peh + ^P0h) ^ for ? -> CXI. 





sup\Qn{0)-Q{9)\ 



Op(l) as n — > 00. 



(b) Let Assumvtions \D71\ I_D.M IP. il \P.S\ and \R.l\ be satisfied. Then 



sup|Q„^fe(0)-Q(0)| 



Op^(l) as min(n, fc) — > 00. 



(c) Suppose C > holds and Assumvtions \P71\ and lR.Si are satisfied. Then 
sup sup sup |Q„,fe(0) - Qn{0)\ = 0;(A:-*/(2*+i)) as 00. 



If Assumvtion \P71\ is strengthened to \P.!A then 

sup sup sup \qn,k{9) - Q„(6')| = 0|.,(/c-i/2) ask^(x. 



nGN See 
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Proof, (a) Set x = inf^jgn (x) and observe that % > by Assumption ID.2I In view of 
Remark ^\) there is a sequence of events An that have probabihty converging to 1 as n — oo 
such that inixenpn{x) > X- On. these events we then have 



sup I 



-m-Q{e)\ 



sup 



n Pn 



n Pk 



<X ^ sup I 



I2 \\Pn-PA\\n- 



Since is compact, the assumptions on Vq imply that supg^Q | 
now completes the proof. 

(b) Let X and An be as in the proof of Part (a) . On An we have 



2 < 00. Part (a) of Theorem[S] 



sup I 

Bee 



< 



< 



sup 

eee 



sup 

eee 



dX- 



-dX 



n Pn Jn Pk 

\Pk\») -Pe) : dX 

Pn 



pI 



1 

Pk 



dX 



< 2x ^sup\\pk{0)-pe\\n + X D^WPn - Pk\\n- 
eee 



The result then follows from Parts (a) and (c) of Theorem [51 

(c) Note that Pk{d) £ V{t,(,D) by construction and pg € V{tX,D) by Assumption IP. II 
Hence, these densities are sup- norm bounded uniformly in 6 (and vi,...,Vk € V in case of 
Pk{0)). Observe now that 

QnAO)-Qn{e)= f {p,{e)-pe)^^^^^^dx. 



Using C > 0, Part (d) of Proposition [T] applied to {p„ : xi, . . . ,x„ e fi, n G N} shows that 
{1/pn ■ xi, . . . , x„ S ri, n g N} is bounded in W2(il). By Assumption IP. II and the construction 
of pk{d), it follows from Part (a) of Proposition [T] that 



Pk{6) +P8 



e Q, xi, . . . ,Xn £ vi, . . . ,Vk e V, n,k e N 



(41) 



is contained in a Sobolev ball Ut,B for some B satisfying < B < 00. The first claim then follows 
from Theorem [1^ with s = (note that under C > Assumption IP. II implies Assumption IP.2I) , 
where we have made use of the inequality \f\dX < A(ri)^/^||/||2 and the fact that the set in 
(|4ip is bounded in the sup- norm. If Assumption IP. II is strengthened to IP.3I we may apply Part 
(c) of Theorem [15] with equal to the set given in (|4T|) to obtain the second claim. ■ 

Remark 43 If C > holds, then the events An in Parts (a) and (b) of the above proof are the 
entire sample space and Q„ — Q, respectively Qn,k — Q, is continuous on Q. By separability of 
Q, the measurability of the respective suprema then follows. 

Lemma 44 (a) Let Assumvtions \P.l\ and [P75\ be satisfied. Then, on the event iidxi^nPnix) > 0, 
the objective function Q„ is twice continuously partially differentiable on Q° with 



^(0) = -2l^ipn-pe)^y)Pn'dX, 



92 



■(0) 



for i,j 



. , m. 
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(b) Let As sumptions \P.1[ and \P.5\ be satisfied. Then Q is twice continuously partially 
differentiable on Q° with 

||W = -2jjp,~Pe)^^i;0)pl'dX^ 

d'^Q n f f I n\ n\ ^^P 



for i,j = l,...,m. 

Proof. Note that the densities involved are all uniformly bounded by Assumption IP. 11 Under 
the respective assumptions, differentiation and integration can be interchanged, leading to the 
above formulae upon noting that the integral of d^p/{d9id6j){-,9) is zero. Continuity of the 
partial derivatives follows from the theorem of dominated convergence. ■ 

Proposition 45 Let Assumvtions \DJ[ IP. Jl and \P.5\ be satisfied and suppose C > 0. Then, for 
all i, j — 1, . . . ,m, 



sup 



dOidO^' ' 86,86 



op(l) as n ^ oo. (42) 



Proof. Let 6 < oo be a bound for all the integrals appearing in Assumption IP.5I By Lemma SH 
the l.h.s. of is not larger than 2(~'^b{l + CtD)\\pn—p^\\n, which converges to in probability 
by Theorem IHl^a) . Measurability of the supremum in (|^^ follows from continuity of the second 
derivatives fLemma |44| and separability of 9°. ■ 

Remark 46 If C = the assertion of the preceding proposition still holds true in outer prob- 
ability under Assumptions ID. II ID. 21 [PH and IP.5I if d'^Qni6)/8686' is interpreted as the zero 
matrix on the event where inixenPn{x) — 0. 
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